Business Intelligence - Zhangxi Lin

Download Report

Transcript Business Intelligence - Zhangxi Lin

ISQS 6339, Data Management & Business Intelligence
Introduction
Zhangxi Lin
Texas Tech University
1
ISQS 6339, Data Mgmt & BI
\\TechShare\coba\d\isqs3358
2
ISQS 6339, Data Mgmt & BI
Outline
 Big Data
 Definitions of BI
 Categorizations of BI
 BI Trend
 BI tools
3
ISQS 6339, Data Mgmt & BI
What is Business Intelligence
 A Simple Definition: The applications and technologies transforming
Business Data into Action
 Business intelligence (BI) is a business management term
 refers to applications and technologies which are used to gather, provide
access to, and analyze data and information about their company
operations.
 Business intelligence systems can help companies gain more comprehensive
knowledge of the factors affecting their business, and help companies to make
better business decisions.
 YouTube:
 What is BI? 2’
 Microsoft Business Intelligence Surface Demo 6’34”
4
ISQS 6339, Data Mgmt & BI
Data, information, and knowledge
 Data – a collection of raw value elements or facts used for calculating,
reasoning, or measuring.
 Information – the result of collecting and organizing data in a way that
establishes relationship between data items, which thereby provides context
and meaning
 Knowledge – the concept of understanding information based on
recognized patterns in a way that provides insight to information.
5
ISQS 6339, Data Mgmt & BI
Online Video
 What is business intelligence? 10’36”
 Retail and Big Data Revolution, 2’12”
 Big data, 7’12”
 Big data terms, 31’19”
Driving force - Big Data
 A collection of data sets so large and complex that it becomes
awkward to work with using on-hand database management tools.
 Difficulties include capture, storage, search, sharing, analysis, and
visualization.
 The trend to larger data sets is due to the additional information
derivable from analysis of a single large set of related data, as
compared to separate smaller sets with the same total amount of
data.
7
Copyright 2012
8/14/2012
8
ISQS7339, Fall 2012
Zettabyte (ZB)
 A quantity of information or information storage
capacity equal to 1021 bytes or 1,000 exabytes.
 As of April 2012, no storage system has achieved one
zettabyte of information.
 The combined space of all computer hard drives in the world was estimated
at approximately 160 exabytes in 2006.
 Seagate reported selling 330 exabytes worth of hard drives during the 2011
Fiscal Year.
 As of 2009, the entire World Wide Web was estimated to contain close to
500 exabytes.This is a half zettabyte.
 1,000,000,000,000,000,000,000 bytes = 10007 bytes =
1021 bytes
9
Data Scale
10
Market
 "Big data" has increased the demand of information management
specialists - major companies have spent more than $15 billion for
this.
 This industry is worth more than $100 billion and growing at
almost 10% a year.
 4.6 billion mobile-phone subscriptions worldwide and between 1
billion and 2 billion people accessing the internet.
 The world's effective capacity to exchange information
through telecommunication networks was 281 petabytes in 1986,
471 petabytes in 1993, 2.2 exabytes in 2000, 65 exabytes in 2007
 It is predicted that the amount of traffic flowing over the internet will
reach 667 exabytes annually by 2013.
11
Copyright 2012
8/14/2012
Approach - Cloud Computing
 Cloud computing is the use
of computing resources
(hardware and software) that are
delivered as a service over
a network (typically the Internet).
The name comes from the use of
a cloud-shaped symbol as an
abstraction for the complex
infrastructure it contains in
system diagrams. Cloud
computing entrusts remote
services with a user's data,
software and computation.
 Buzzword: SaaS/IaaS/PaaS
12
ISQS 6339, Data Mgmt & BI
Distributed business intelligence
 Deal with big data – the open & distributed approach
 LAMP
 Hadoop
 MapReduce
 HDFS
 NOSQL
 Zookeeper
 Storm
13
ISQS7339, Fall 2012
Apache Hadoop
 An open-source software framework for storage and large scale
processing of data-sets on clusters of commodity hardware.
 The Apache Hadoop framework is composed of the following
modules :
 Hadoop Common - contains libraries and utilities needed by other Hadoop
modules
 Hadoop Distributed File System (HDFS).
 Hadoop YARN - a resource-management platform responsible for managing
compute resources in clusters and using them for scheduling of users'
applications.
 Hadoop MapReduce - a programming model for large scale data processing.
 Apache Hadoop's MapReduce and HDFS components originally derived
respectively from Google's MapReduce and Google File System (GFS) papers.
14
ISQS 6339, Data Mgmt & BI
A Multi-node Hadoop Cluster
15
ISQS 6339, Data Mgmt & BI
16
ISQS 6339, Data Mgmt & BI
17
ISQS 6339, Data Mgmt & BI
18
ISQS 6339, Data Mgmt & BI
19
ISQS 6339, Data Mgmt & BI
20
ISQS 6339, Data Mgmt & BI
21
ISQS 6339, Data Mgmt & BI
22
ISQS 6339, Data Mgmt & BI
Hadoop 2: Big data's big leap
forward
 The new Hadoop is the Apache Foundation's attempt to create a
whole new general framework for the way big data can be stored,
mined, and processed.
 The biggest constraint on scale has been Hadoop’s job handling. All
jobs in Hadoop are run as batch processes through a single daemon
called JobTracker, which creates a scalability and processing-speed
bottleneck.
 Hadoop 2 uses an entirely new job-processing framework built
using two daemons: ResourceManager, which governs all jobs in
the system, and NodeManager, which runs on each Hadoop node
and keeps the ResourceManager informed about what's happening
on that node.
23
ISQS 6339, Data Mgmt & BI
MapReduce 2.0 – YARN
(Yet Another Resource Negotiator)
24
ISQS 6339, Data Mgmt & BI
The process of BI
 Data -> information -> knowledge -> actionable plans
 Data -> information: the process of determining what data is to be
collected and managed and in what context
 Information -> knowledge: The process involving the analytical
components, such as data warehousing, online analytical processing, data
quality, data profiling, business rule analysis, and data mining
 Knowledge -> actionable plans: The most important aspect in a BI
process
25
ISQS 6339, Data Mgmt & BI
Actionable Knowledge
 An information asset retains its value on if the converted
knowledge is actionable.
 Need some methods for extracting value from knowledge
 This is not a technical issue but an organizational one – need empowered
individuals in the organization to take the action
 There is an issue of Return on Investment (ROI)
26
ISQS 6339, Data Mgmt & BI
BI Problems
 Structured








Detecting Credit card fraud
Setting Loan parameters
Market segmentation/Mass customization
Deciding Marketing mix
Customer Churn
Reducing employee turnover
Improving Quality/Efficiency
…
 Unstructured
 Data exploration
 Utilization of resources (stored knowledge) to maximum effectiveness
 …
27
ISQS 6339, Data Mgmt & BI
BI Applications
 Customer Analytics
 Customer profiling
 Targeted marketing
 Personalization
 Collaborative filtering
 Customer satisfaction
 Customer lifetime value
 Customer loyalty
 Sales Channel Analytics
 Marketing
 Sales performance and pipeline
28
ISQS 6339, Data Mgmt & BI
BI Applications (2)
 Supply Chain Analytics
 Supplier and vendor management
 Shipping
 Inventory control
 Distribution analysis
 Behavior Analysis
 Purchasing trends
 Web activity
 Fraud and abuse detection
 Customer attrition
 Social network analysis
29
ISQS 6339, Data Mgmt & BI
The Evolution of Business Intelligence
 1st Generation – Traditional analytics (query and reporting)
 2nd Generation – Traditional generation (OLAP, data
warehousing)
 2.5nd Generation – New traditional generation
 3rd Generation - Advanced analytics
 Rules, predictive analytics and realtime data mining
 Stream analytics
30
ISQS 6339, Data Mgmt & BI
Business Intelligence Classifications
Stream Analytics*
Real-time, continuous, sequential analysis
(ranging from basic to advanced analytics)
* In lieu of stream analytics, “embedded analytics,” although architecturally
different, could potentially play the same role
3rd-Generation BI
Advanced Analytics/Optimization
Rules
Predictive Analytics
Real-time and traditional Data Mining
“New Traditional” Analytics
“2.5-Gen” Analytics (In-Memory OLAP, Search-Based)
Source:
Bill O’Connell
IBM, Aug 2007
31
Traditional Analytics
1st Generation Analytics (Query & Reporting)
2nd Generation Analytics (OLAP, Data Warehousing)
ISQS 6339, Data Mgmt & BI
Legacy BI
Business Intelligence Use Cases
Stream Analytics*
Focus on what is
happening RIGHT NOW
Example Target Solutions:
Fraud Detection / Risk
CRM Analytic
Supply Chain Optimization
RFID / Spatial Data
Other High-Volume
Real-time, continuous, sequential analysis
(ranging from basic to advanced analytics)
* In lieu of stream analytics, “embedded analytics,” although architecturally
different, could potentially play the same role
Focus on what will
happen
Advanced Analytics/Optimization
Rules
Predictive Analytics
Real-time and traditional Data Mining
Real-Time Threshold
“New Traditional” Analytics
Focus on what did
happen
Turning data into
information is limited by the
relationships which the
end-user already knows to
look for.
32
“2.5-Gen” Analytics (In-Memory OLAP, Search-Based)
Analytic applications that
apply statistical
relationships in the form
of RULES
Data mining to determine
why something
happened by unearthing
relationships that the
end-user may not have
known existed.
Traditional Analytics
1st
2nd
Generation Analytics (Query & Reporting)
Generation Analytics (OLAP, Data Warehousing)
ISQS 6339, Data Mgmt & BI
Source:
Bill O’Connell
IBM, Aug 2007
Data Center The Headquarter of Big Data
Case of BaoCloud Center at Shanghai
The land for data center at
Shanghai
34
Customizable Data Center
Baocloud data center
38
39