DEVQ400-01 Developing OLAP Business Solutions with Analysis
Download
Report
Transcript DEVQ400-01 Developing OLAP Business Solutions with Analysis
Data Warehouse
DSS
Business Intelligence
[email protected]
BI Evolution
History
Legacy
MIS
Reports
Current
Decision
Support
Systems
Business
Intelligence
Source: META Group Inc.
2005+
Business
Performance
Management
•Hand coded
•Report writers
•OLAP
•Dashboard/mining
•Single
system data
•Joined
operating data
•DW
•Enterprise portals
•Summary metrics
•Statistical
metrics
•Predictive metrics
•Recommendations
•Extreme latency
•Extreme cost
•Extreme ‘infoglut’
•Extreme integration
Moving beyond one-way info delivery to true BPM
© Minder Chen, 2004-2010
Data Warehouse - 2
BI Questions
• What happened?
–
What were our total sales this month?
• What’s happening?
–
Are our sales going up or down, trend analysis
• Why?
–
Why have sales gone down?
• What will happen?
–
Forecasting & “What If” Analysis
• What do I want to happen?
–
Planning & Targets
Source: Bill Baker, Microsoft
© Minder Chen, 2004-2010
Data Warehouse - 3
BI
Business Intelligence (BI) is the process of gathering meaningful
information to answer questions and identify significant trends or
patterns, giving key stakeholders the ability to make better
business decisions.
“The key in business is to
know something that
nobody else knows.”
-- Aristotle Onassis
PHOTO: HULTON-DEUTSCH COLL
“To understand is to perceive patterns.”
— Sir Isaiah Berlin
"The manager asks how and when,
the leader asks what and why."
— “On Becoming a Leader” by Warren Bennis
© Minder Chen, 2004-2010
Data Warehouse - 4
BI Definition
• Business intelligence provides the ability to
transform data into usable, actionable
information for business purposes. BI
requires:
– Collections of quality data and metadata
important to the business
– The application of analytic tools, techniques,
and processes
– The knowledge and skills to use business
analysis to identify/create business information
– The organizational skills and motivation to
develop a BI program and apply the results
back into the business
© Minder Chen, 2004-2010
Data Warehouse - 5
Business Intelligence
Increasing potential
to support
business decisions (MIS)
Making
Decisions
Data Presentation
Visualization Techniques
End User
Business
Analyst
Data
Analyst
Data Mining
Information Discovery
Data Exploration
OLAP, MDA,
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts
Data Sources
(Paper, Files, Information Providers, Database Systems, OLTP)
© Minder Chen, 2004-2010
DBA
Data Warehouse - 6
Inmon's Definition of Data Warehouse – Data View
• A warehouse is a
– subject-oriented,
– integrated,
– time-variant and
– non-volatile
collection of data in support of
management's decision making process.
Source: http://www.intranetjournal.com/features/datawarehousing.html
– Bill Inmon in 1990
© Minder Chen, 2004-2010
Data Warehouse - 7
Inmon's Definition Explain
• Subject-oriented: They are organized around major
subjects such as customer, supplier, product, and
sales. Data warehouses focus on modeling and
analysis to support planning and management
decisions v.s. operations and transaction processing.
• Integrated: Data warehouses involve an integration of
sources such as relational databases, flat files, and online transaction records. Processes such as data
cleansing and data scrubbing achieve data
consistency in naming conventions, encoding
structures, and attribute measures.
• Time-variant: Data contained in the warehouse provide
information from an historical perspective.
• Nonvolatile: Data contained in the warehouse are
physically separate from data present in the
operational environment.
© Minder Chen, 2004-2010
Data Warehouse - 8
The Data Warehouse Process
Data Marts
and cubes
Source
Systems
Clients
Data
Warehouse
1
Design the
Data Warehouse
© Minder Chen, 2004-2010
2
Populate
Data Warehouse
Query Tools
Reporting
Analysis
Data Mining
3
Create
OLAP Cubes
4
Query
Data
Data Warehouse - 9
BI Architecture
Source: http://www.rpi.edu/datawarehouse/docs/DW-Architecture.pdf
© Minder Chen, 2004-2010
Data Warehouse - 10
BI Infrastructure Components
© Minder Chen, 2004-2010
Data Warehouse - 11
Key Concepts in BI Development Lifecycle
Technology
Data
Application
© Minder Chen, 2004-2010
Data Warehouse - 12
Performance Dashboards for Information Delivery
© Minder Chen, 2004-2010
Data Warehouse - 13
OLTP Versus Business Intelligence: Who asks what?
OLTP Questions
• When did that order ship?
• How many units are in
inventory?
• Does this customer have
unpaid bills?
• Are any of customer X’s line
items on backorder?
© Minder Chen, 2004-2010
Analysis Questions
• What factors affect order
processing time?
• How did each product line (or
product) contribute to profit last
quarter?
• Which products have the lowest
Gross Margin?
• What is the value of items on
backorder, and is it trending up
or down
over time?
Data Warehouse - 14
Classification of Entity Types
© Minder Chen, 2004-2010
Data Warehouse - 15
Transaction Level Order Item Fact Table
© Minder Chen, 2004-2010
Data Warehouse - 16
OLTP Versus OLAP
OLTP Questions
• When did that order ship?
• How many units are in
inventory?
• Does this customer have
unpaid bills?
• Are any of customer X’s line
items on backorder?
© Minder Chen, 2004-2010
OLAP Questions
• What factors affect order
processing time?
• How did each product line (or
product) contribute to profit last
quarter?
• Which products have the lowest
Gross Margin?
• What is the value of items on
backorder, and is it trending up
or down over time?
Data Warehouse - 17
Requirements
© Minder Chen, 2004-2010
Data Warehouse - 18
Dimensional Design Process
Business
Requirements
• Select the business process to model
• Declare the grain of the business process/data
in the fact table
• Choose the dimensions that apply to each fact
table row
• Identify the numeric facts that will populate
each fact table row
Data
Realities
© Minder Chen, 2004-2010
Data Warehouse - 19
Star Schema
Source: Moody and Kortink, "From ER Models to Dimensional Models: Bridging the Gap between OLTP and
OLAP Design, Part I," Business Intelligence Journal, Summer 2003, pp. 7-24.
© Minder Chen, 2004-2010
Data Warehouse - 20
Identifying Measures and Dimensions
Performance Measures
for KPI
Measures
Performance Drivers
Dimensions
The attribute varies
continuously:
The attribute is perceived as
a constant or discrete value:
•Balance
•Unit Sold
•Cost
•Sales
•Description
•Location
•Color
•Size
© Minder Chen, 2004-2010
Data Warehouse - 21
A Dimensional Model for a Grocery Store Sales
Why?
© Minder Chen, 2004-2010
Data Warehouse - 22
Facts Table
Measurements of business events.
DateID
ProductID
Dimensions
CustomerID
Units
Dollars
Measures
The Fact Table contains keys and units of
measure
© Minder Chen, 2004-2010
Data Warehouse - 23
Fact Tables
Fact tables have the following characteristics:
• Contain numeric measures (metric) of the
business
• May contain summarized (aggregated) data
• May contain date-stamped data
• Are typically additive
• Have key value that is typically a concatenated
key composed of the primary keys of the
dimensions
• Joined to dimension tables through foreign
keys that reference primary keys in the
dimension tables
© Minder Chen, 2004-2010
Data Warehouse - 24
Store Dimension
• It is not uncommon to represent multiple
hierarchies in a dimension table. Ideally, the
attribute names and values should be unique
across the multiple hierarchies.
© Minder Chen, 2004-2010
Data Warehouse - 25
Inside a Dimension Table
• Dimension table key: Uniquely identify each row. Use
surrogate key (integer).
• Table is wide: A table may have many attributes
(columns).
• Textual attributes. Descriptive attributes in string
format. No numerical values for calculation.
• Attributes not directly related: E.g., product color and
product package size. No transitive dependency.
• Not normalized (star schemar).
• Drilling down and rolling up along a dimension.
• One or more hierarchy within a dimension.
• Fewer number of records.
© Minder Chen, 2004-2010
Data Warehouse - 26
Product Dimension
• SKU: Stock Keeping Unit
• Hierarchy:
–
Department Category Subcategory Brand Product
© Minder Chen, 2004-2010
Data Warehouse - 27
Hierarchy
© Minder Chen, 2004-2010
Data Warehouse - 28
Levels and Members
Year
Quarter
Month
1999
Quarter 1
Jan
1999
Quarter 1
Feb
1999
Quarter 1
Mar
1999
Quarter 2
Apr
1999
Quarter 2
May
1999
Quarter 2
Jun
1999
Quarter 3
Jul
1999
Quarter 3
Aug
1999
Quarter 3
Sep
1999
Quarter 4
Oct
1999
Quarter 4
Nov
1999
Quarter 4
Dec
© Minder Chen, 2004-2010
Data Warehouse - 29
Operations in Multidimensional Data Model
• Aggregation (roll-up)
– dimension reduction: e.g., total sales by city
– summarization over aggregate hierarchy: e.g.,
total sales by city and year -> total sales by
region and by year
• Selection (slice) defines a subcube
– e.g., sales where city = Palo Alto and date =
1/15/96
• Navigation to detailed data (drill-down)
– e.g., (sales - expense) by city, top 3% of cities
by average income
• Visualization Operations (e.g., Pivot)
© Minder Chen, 2004-2010
Data Warehouse - 30
Drilling down in a
data mart is nothing
more than adding
row headers from
the dimension
tables. Drilling up is
removing row
headers.
We can drill down or
up on attributes from
more than one
explicit hierarchy and
with attributes that
are part of no
hierarchy.
© Minder Chen, 2004-2010
Data Warehouse - 31
Avoid Null Key in the Fact Table
• Include a row in the corresponding dimension table to identify that
the dimension is not applicable to the measurent.
11: No Promotion
Sales Fact Table
© Minder Chen, 2004-2010
Data Warehouse - 32
Querying the Retail Sales Schema
© Minder Chen, 2004-2010
Data Warehouse - 33
Dragging and dropping dimensional attributes and facts into a simple report
© Minder Chen, 2004-2010
Data Warehouse - 34
ETL
ETL = Extract, Transform, Load
• Moving data from production systems to DW
• Checking data integrity
• Assigning surrogate key values
• Collecting data from disparate systems
• Reorganizing data
© Minder Chen, 2004-2010
Data Warehouse - 35
Building The Warehouse
Transforming Data
© Minder Chen, 2004-2010
Data Warehouse - 36
Pivot Table in Excel
© Minder Chen, 2004-2010
Data Warehouse - 37
Use of Data Mining
•
•
•
•
•
Customer profiling
Market segmentation
Buying pattern affinities
Database marketing
Credit scoring and risk analysis
© Minder Chen, 2004-2010
Data Warehouse - 38
OLAP and Data Mining Address Different Types of Questions
While reporting and OLAP are informative about past facts, only data mining can help
you predict the future of your business.
OLAP
Data Mining
What was the response rate to our mailing?
What is the profile of people who are likely
to respond to future mailings?
How many units of our new product did we Which existing customers are likely to buy
sell to our existing customers?
our next new product?
Who were my 10 best customers last year?
Which 10 customers offer me the greatest
profit potential?
Which customers didn't renew their policies Which customers are likely to switch to the
last month?
competition in the next six months?
Which customers defaulted on their loans?
Is this customer likely to be a good credit
risk?
What were sales by region last quarter?
What are expected sales by region next
year?
What percentage of the parts we produced
yesterday are defective?
What can I do to improve throughput and
reduce scrap?
Source: http://www.dmreview.com/editorial/dmreview/print_action.cfm?articleId=2367
© Minder Chen, 2004-2010
Data Warehouse - 39
UA’s Existing Data Warehouse
Balanced Scorecard
Financial
Perspective
Improve Stakeholder Value
Stakeholder Value
ROCE
Revenue Growth Strategy
Increase Customer
Value
Build the Franchise
New Revenue Services
Productivity Strategy
Improve Cost
Structure
Profitability
Customer Acquisition
Cost per Unit
Improve Asset
Utilization
Asset Utilization
Customer Retention •
Customer Profitability
Product Leadership
Customer Intimacy
Customer
Perspective
Customer Value Proposition
Product/Service Attributes
Price
Quality
Time
(Innovation
Processes)
Learning &
Growth
Perspective
© Minder Chen, 2004-2010
Relationship
Function
Service
Image
Relations
Brand
Customer Satisfaction
“Build the Franchise”
Internal
Perspective
Operational Excellence
“Increase
Customer Value”
“Achieve Operational
Excellence”
“Become a
Good Neighbor”
(Customer
Management
Processes)
(Operations &
Logistics Processes)
(Regulatory &
Environmental
Processes)
A Motivated and Prepared Workforce
Strategic
Competencies
Strategic
Technologies
Climate for
Action
Data Warehouse - 40