Transcript PPT

Data Warehousing
Multidimensional Analysis
Joachim Hammer
Joachim Hammer
1
Data Warehousing Architecture
Information Sources
Data Warehouse
Server
OLAP Servers
MOLAP
Semistructured
Sources
Clients
Analysis
Data
Warehouse
extract
transform
load
refresh
etc.
Query/Reporting
serve
ROLAP
Operational
DB’s
Data Mining
Data Marts
Joachim Hammer
2
Multidimensional Modeling
• Support ad-hoc querying for business analyst
• Think in terms of spreadsheets
– View sales data by geography, time, or product
• Away from traditional ER models
– “One fact in one location”
– Entities are inter-related through a series of joins
Joachim Hammer
3
Multidimensional Data Model
• Database is a set of facts in multidimensional
space
• Measures
– Numerical data being tracked
– Stored in central fact table
– E.g., sales, inventory, expenditures
• Dimensions
– Business parameters
– E.g., time, geography, account data
Joachim Hammer
4
Example
• “Sales by product line manager over the past six
months”
Account Info
Key columns joining fact table
Numerical Measures
to dimension tables
Prod Code Time Code Acct Code Sales
Qty
Product Info
Time Info
...
Joachim Hammer
5
Multidimensional Data Model Cont’d
• Dimensions have attributes
• Organized into hierarchies
– E.g., Time dimension: days  weeks  quarters
– E.g., Product dimension: product  product line  brand
• Operators to navigate the hierarchies
–
–
–
–
“Roll-up”
Drill-down is the opposite of roll-up
Slice (defines a subcube)
Various visualization ops (e.g., pivot)
• Physical architecture of dimensional model is
described by “star” schema
Joachim Hammer
6
Star Schema
• Single fact table and a single table for each
dimension
• Dimension tables are denormalized
– E.g., dimension attributes may be stored multiple times
ProductCode ProductName ProductColor BrandCode
Normalized Representation
BrandCode BrandMgr
ProductCode ProductName ProductColor BrandCode BrandMgr
Denormalized
Joachim Hammer
Example:
100 separate products
5 brands
lots of redundancies
7
Advantages of Star Schema
• Reduces the number of physical joins
• Simplify the view of the data model
• Allows rel. easy maintenance
Joachim Hammer
8
Example
roll-up to region
NY
SF
roll-up to brand
Product
LA
Juice
Milk
Coke
Cream
Soap
Bread
10
34
56
32
12
56
roll-up to week
M T W Th F S S
Dimensions:
Time, Product, Geography
Attributes:
Product (upc, price, …)
Geography …
…
Hierarchies:
Product  Brand  …
Day  Week  Quarter
City  Region  Country
Time
56 units of bread sold in LA on M
Joachim Hammer
9
Example Cont’d
Geography
Time
Time Code
Quarter Code
Quarter Name
Week Code
Day Code
Day name
Account
Account Code
Key Account Code
Account Name
Account Type
Account Market
Joachim Hammer
Sales
Geography Code
Time Code
Account Code
Dollar Amount
Units
Geography Code
Region Code
Region Mgr
City Code
City Name
Product
Product Code
Product Name
Brand Mgr
Brand Code
Prod. Line Code
Prod. Line Name
Prod. Name
...
10