Multi-Dimensional Modelling

Download Report

Transcript Multi-Dimensional Modelling

Exploiting the DW data
• DW is a platform for creating a wide array of
reports
• It solves data feed problems, but does not lead
to specific decision support
• Need a model for organising data into
meaningful reports
• Need specific interfaces for users
Exploiting the DW Data
Static Reporting
Data Staging
Area
Source Systems
Scrutinising
Data Warehouse
Relational Database
on a dedicated Server
Extraction
Cleaning
Transformation
Loading
Multidimensional
Data Cubes
OLAP tools
De normalised, data
Discovering
Data Mining
…….
Multidimensional Models
Customer
Market
Product
Time
SALES
Product
Time
FINANCE
P/L_Line
The data is found at the intersection of
dimensions.
Representing multidimensional data
MOLAP Server
• The application layer
stores data in a
multidimensional structure
DSS client
• The presentation layer
provides the
MOLAP
multidimensional view
Engine
• Efficient storage and processing Application
layer
• Complexity hidden from the
user (but NOT from developer)
• Analysis using pre-aggregated
summaries and pre-calculated Warehouse
measures
ROLAP Server
• The warehouse stores
DSS client
atomic data.
• The application layer
ROLAP
generates SQL for the
engine
three- dimensional view.
Application
• The presentation layer Multiple layer
SQL
provides the
multidimensional view.
Warehouse
server
MOLAP
MDDB
Query
Periodic
load
Warehouse
Data
Server
user
ROLAP
Cache
Live
fetch
Query
Data
cache
Warehouse
Data
Server
user
Also Hybrid (HOLAP)
Choosing a Reporting
Architecture
•
•
•
•
•
•
•
Business needs
Good
Potential for growth
Query
interface
Performance
enterprise architecture
OK
Network architecture
Speed of access
Openness
MOLAP
ROLAP
Simple
Complex
Analysis
Modeling
• Warehouses differ from operational
structures:
– Analytical requirements
– Subject orientation
• Data must map to subject oriented
information:
– Identify business subjects
– Define relationships between subjects
– Name the attributes of each subject
• Modeling is iterative
• Modeling tools are available
Modeling the Data Warehouse
1
1. Defining the business
model
2. Creating the dimensional
model
2, 3
3. Modeling summaries
4. Creating the physical model
4
Physical model
Select a
business
process
Identifying Business Rules
Location
Geographic proximity
0 - 1 miles
1 - 5 miles
> 5 miles
Time
Month > Quarter > Year
Product
Type
Monitor
Status
PC
Server
15 inch
17 inch
19 inch
None
New
Rebuilt
Custom
Store
Store > District > Region
Creating the Dimensional Model
Identify fact tables
– Translate business measures into fact
tables
– Analyze source system information for
additional measures
– Identify base and derived measures
– Document additivity of measures
Identify dimension tables
Link fact tables to the dimension
tables
Create views for users
Dimension Tables
Dimension tables have the following
characteristics:
• Contain textual information that
represents the attributes of the business
• Contain relatively static data
• Are joined to a fact table through a
foreign key reference
Product
Channel
Facts
(units,
price)
Customer
Time
Fact Tables
Fact tables have the following characteristics:
• Contain numeric measures (metrics) of the
business
• May contain summarized (aggregated) data
• May contain date-stamped data
• Are typically additive
• Have key value that is typically a concatenated
key composed of the primary keys of the
dimensions
• Joined to dimension tables through foreign
keys that reference primary keys in the
dimension tables
Dimensional Model
(Star Schema)
Fact table
Product
Channel
Facts
(units,
price)
Customer
Time
Dimension tables
Star Schema Model
Product Table
Product_id
Product_desc
…
• Central fact table
• Radiating dimensions
• Denormalized model
Time Table
Day_id
Month_id
Period_id
Year_id
Store Table
Store_id
District_id
...
Sales Fact Table
Product_id
Store_id
Item_id
Day_id
Sales_dollars
Sales_units
...
Item Table
Item_id
Item_desc
...
Star Schema Model
•
•
•
•
•
•
Easy for users to understand
Fast response to simple queries
Simple metadata
Supported by many front end tools
Less robust to change
Does not support history
Using Summary Data
Phase 3: Modeling summaries
• Provides fast access to pre-computed
data
• Reduces use of I/O, CPU, and memory
• Is distilled from source systems and precalculated summaries
• Usually exists in summary fact tables
Designing Summary Tables
• Average
• Maximum
• Total
• Percentage
Units
Product A
Total
Product B
Total
Product C
Total
Sales(€)
Store
Summary Tables Example
SALES FACTS
Sales Region Month
10,000 North Jan 99
12,000 South Feb 99
11,000 North Jan 99
15,000 West Mar 99
18,000 South Feb 99
20,000 North Jan 99
10,000 East Jan 99
2,000 West Mar 99
SALES BY MONTH/REGION
Month Region Tot_Sales$
Jan 99 North 41,000
Jan 99 East 10,000
Feb 99 South 40,000
Mar 99 West 17,000
SALES BY MONTH
Month Tot_Sales
Jan 99 51,000
Feb 99 40,000
Mar 99 17,000