DEVQ400-01 Developing OLAP Business Solutions with Analysis

Download Report

Transcript DEVQ400-01 Developing OLAP Business Solutions with Analysis

Data Warehouse and
Business Intelligence
Dr. Minder Chen
[email protected]
Spring 2010
BI
Business Intelligence (BI) is the process of gathering meaningful
information to answer questions and identify significant trends or
patterns, giving key stakeholders the ability to make better
business decisions.
“The key in business is to
know something that
nobody else knows.”
-- Aristotle Onassis
PHOTO: HULTON-DEUTSCH COLL
“To understand is to perceive patterns.”
— Sir Isaiah Berlin
"The manager asks how and when,
the leader asks what and why."
— “On Becoming a Leader” by Warren Bennis
© Minder Chen, 2004-2008
Data Warehouse - 2
Business Intelligence
Increasing potential
to support
business decisions (MIS)
Making
Decisions
Manager/executive
Data Presentation
Visualization Techniques
Business
Analyst
Data
Analyst
Data Mining
Information Discovery
Data Exploration
OLAP, MDA,
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts
Data Sources
(Paper, Files, Information Providers, Database Systems, OLTP)
© Minder Chen, 2004-2008
DBA
Data Warehouse - 3
Inmon's Definition Explain
• Subject-oriented: They are organized around major
subjects such as customer, supplier, product, and
sales. Data warehouses focus on modeling and
analysis to support planning and management
decisions vs. operations and transaction processing.
• Integrated: Data warehouses involve an integration of
sources such as relational databases, flat files, and online transaction records. Processes such as data
cleansing and data scrubbing achieve data
consistency in naming conventions, encoding
structures, and attribute measures.
• Time-variant: Data contained in the warehouse provide
information from an historical perspective.
• Nonvolatile: Data contained in the warehouse are
physically separate from data present in the
operational environment.
© Minder Chen, 2004-2008
Data Warehouse - 4
The Data Warehouse Process
Data Marts
and cubes
Source
Systems
Clients
Data
Warehouse
1
Design the
Data Warehouse
© Minder Chen, 2004-2008
2
Populate
Data Warehouse
Query Tools
Reporting
Analysis
Data Mining
3
Create
OLAP Cubes
4
Query
Data
Data Warehouse - 5
Performance Dashboards for Information Delivery
© Minder Chen, 2004-2008
Data Warehouse - 6
OLTP Normalized Design
Warehouse
Ordering
Process
Chain
Retailer
Store
Retailer
Payments
Retailer
Returns
Product
POS
Process
Retail
Promo
Brand
GL
Account
Retail
Cust
Cash
Register
© Minder Chen, 2004-2008
Clerk
Data Warehouse - 7
OLTP Versus OLAP
OLTP Questions
• When did that order ship?
• How many units are in
inventory?
• Does this customer have
unpaid bills?
• Are any of customer X’s line
items on backorder?
© Minder Chen, 2004-2008
OLAP Questions
• What factors affect order
processing time?
• How did each product line (or
product) contribute to profit last
quarter?
• Which products have the lowest
Gross Margin?
• What is the value of items on
backorder, and is it trending up
or down over time?
Data Warehouse - 8
OLTP vs. OLAP
Source: http://www.rainmakerworks.com/pdfdocs/OLTP_vs_OLAP.pdf#search=%22OLTP%20vs.%20OLAP%22
© Minder Chen, 2004-2008
Data Warehouse - 9
Dimensional Design Process
Business
Requirements
• Select the business process to model
• Declare the grain of the business process/data
in the fact table
• Choose the dimensions that apply to each fact
table row
• Identify the numeric facts that will populate
each fact table row
Data
Realities
© Minder Chen, 2004-2008
Data Warehouse - 10
Select a business process to model
• Not business departments or business
functions
• Cross-functional business processes
• Business events
• Examples:
–
–
–
–
–
–
Raw materials purchasing
Order fulfillment process
Shipments
Invoicing
Inventory
General ledger
© Minder Chen, 2004-2008
Data Warehouse - 11
Requirements
© Minder Chen, 2004-2008
Data Warehouse - 12
Identifying Measures and Dimensions
Performance Measures
for KPI
Measures
Performance Drivers
Dimensions
The attribute varies
continuously:
The attribute is perceived as
a constant or discrete value:
•Balance
•Unit Sold
•Cost
•Sales
•Description
•Location
•Color
•Size
© Minder Chen, 2004-2008
Data Warehouse - 13
A Dimensional Model for a Grocery Store Sales
© Minder Chen, 2004-2008
Data Warehouse - 14
Product Dimension
• SKU: Stock Keeping Unit
• Hierarchy:
–
Department  Category  Subcategory  Brand  Product
© Minder Chen, 2004-2008
Data Warehouse - 15
Inside a Dimension Table
• Dimension table key: Uniquely identify each row. Use
surrogate key (integer).
• Table is wide: A table may have many attributes
(columns).
• Textual attributes. Descriptive attributes in string
format. No numerical values for calculation.
• Attributes not directly related: E.g., product color and
product package size. No transitive dependency.
• Not normalized (star schemar).
• Drilling down and rolling up along a dimension.
• One or more hierarchy within a dimension.
• Fewer number of records.
© Minder Chen, 2004-2008
Data Warehouse - 16
Fact Tables
Fact tables have the following characteristics:
• Contain numeric measures (metric) of the
business
• May contain summarized (aggregated) data
• May contain date-stamped data
• Are typically additive
• Have key value that is typically a concatenated
key composed of the primary keys of the
dimensions
• Joined to dimension tables through foreign
keys that reference primary keys in the
dimension tables
© Minder Chen, 2004-2008
Data Warehouse - 17
Facts Table
Measurements of business events.
DateID
ProductID
Dimensions
CustomerID
Units
Dollars
Measures
The Fact Table contains keys and units of
measure
© Minder Chen, 2004-2008
Data Warehouse - 18
Hierarchy
© Minder Chen, 2004-2008
Data Warehouse - 19
Operations in Multidimensional Data Model
• Aggregation (roll-up)
– dimension reduction: e.g., total sales by city
– summarization over aggregate hierarchy: e.g., total
sales by city and year  total sales by region and
by year
• Selection (slice) defines a subcube
– e.g., sales where city = Palo Alto and date = 1/15/96
• Navigation to detailed data (drill-down)
– e.g., (sales - expense) by city, top 3% of cities by
average income
• Visualization Operations (e.g., Pivot)
© Minder Chen, 2004-2008
Data Warehouse - 20
A Visual Operation: Pivot (Rotate)
Juice
10
Cola
47
Milk
30
Cream 12
Product
3/1 3/2 3/3 3/4
Date
© Minder Chen, 2004-2008
Data Warehouse - 21
Date Dimension of the Retail Sales Model
© Minder Chen, 2004-2008
Data Warehouse - 22
Store Dimension
• It is not uncommon to represent multiple
hierarchies in a dimension table. Ideally, the
attribute names and values should be unique
across the multiple hierarchies.
© Minder Chen, 2004-2008
Data Warehouse - 23
ETL
ETL = Extract, Transform, Load
• Moving data from production systems to DW
• Checking data integrity
• Assigning surrogate key values
• Collecting data from disparate systems
• Reorganizing data
© Minder Chen, 2004-2008
Data Warehouse - 24
Pivot Table in Excel
© Minder Chen, 2004-2008
Data Warehouse - 25
OLAP and Data Mining Address Different Types of Questions
While reporting and OLAP are informative about past facts, only data mining can help
you predict the future of your business.
OLAP
Data Mining
What was the response rate to our mailing?
What is the profile of people who are likely
to respond to future mailings?
How many units of our new product did we Which existing customers are likely to buy
sell to our existing customers?
our next new product?
Who were my 10 best customers last year?
Which 10 customers offer me the greatest
profit potential?
Which customers didn't renew their policies Which customers are likely to switch to the
last month?
competition in the next six months?
Which customers defaulted on their loans?
Is this customer likely to be a good credit
risk?
What were sales by region last quarter?
What are expected sales by region next
year?
What percentage of the parts we produced
yesterday are defective?
What can I do to improve throughput and
reduce scrap?
Source: http://www.dmreview.com/editorial/dmreview/print_action.cfm?articleId=2367
© Minder Chen, 2004-2008
Data Warehouse - 26
Use of Data Mining
•
•
•
•
•
Customer profiling
Market segmentation
Buying pattern affinities
Database marketing
Credit scoring and risk analysis
© Minder Chen, 2004-2008
Data Warehouse - 27
Associates
Which items are purchased in a retail store at the
same time?
© Minder Chen, 2004-2008
Data Warehouse - 28
Sequential Patterns
What is the likelihood that a customer will
buy a product next month, if he buys a related
item today?
© Minder Chen, 2004-2008
Data Warehouse - 29
Classifications
Determine customers’ buying patterns
and then find other customers with
similar attributes that may be targeted for
a marketing campaign.
© Minder Chen, 2004-2008
Data Warehouse - 30
Modeling
Use factors, such as location, number of
bedrooms, and square footage, to
Determine the market value of a property
© Minder Chen, 2004-2008
Data Warehouse - 31