Data Warehouses and Data Marts

Download Report

Transcript Data Warehouses and Data Marts

Chapter 11
Logical Database Design
Author: Graeme C. Simsion and Graham C. Witt
Are data warehouses / marts just
operational databases?
Query
Tools
Query
Tools
Query
Tools
Data
Mart
Data
Mart
Data
Mart
Load
Program
Load
Program
Load
Program
Data Warehouse
Load
Program
Load
Program
Load
Program
Load
Program
Load
Program
Source
Data
Source
Data
Source
Data
Source
Data
External
Data
Copyright: ©2005 by Elsevier Inc. All rights reserved.
2
Differences Compared with
Operational Databases
• Different requirements
• Database technology may not be
relational
– Special multi-dimensional databases now
exist to service this area
• Many techniques of database design
can still apply
Copyright: ©2005 by Elsevier Inc. All rights reserved.
3
Characteristics
• Data integration
– With data from many operational databases
•
•
•
•
•
Loads data rather than performs updates
Less predictable database ‘hits’
Complex queries but a simple interface
May emphasize historical data
May summarize other data
Copyright: ©2005 by Elsevier Inc. All rights reserved.
4
Quality Criteria Revisited
•
•
•
•
•
•
•
•
Completeness
Non-redundancy
Enforcement of (business) rules
Data reusability
Stability and flexibility
Simplicity and elegance
Communication and effectiveness
Performance
Copyright: ©2005 by Elsevier Inc. All rights reserved.
5
Modelling / designing?
• Data warehouses feed data marts
– Need a separate approach
– Marts are more focused
– Warehouses are more general and must
handle all marts envisaged
• Consider each in turn
• (revisit 16.1 over page)
Copyright: ©2005 by Elsevier Inc. All rights reserved.
6
Query
Tools
Query
Tools
Query
Tools
Data
Mart
Data
Mart
Data
Mart
Load
Program
Load
Program
Load
Program
Data Warehouse
Load
Program
Load
Program
Load
Program
Load
Program
Load
Program
Source
Data
Source
Data
Source
Data
Source
Data
External
Data
Copyright: ©2005 by Elsevier Inc. All rights reserved.
7
Modeling for Data
Warehouses
1. May need an initial corporate model of the
business
2. Need to understand existing (operational)
data (bases)
3. Determine requirements of the warehouse
4. Determine sources and handling differences
5. Shaping data for data marts
Last two steps are more complex
Copyright: ©2005 by Elsevier Inc. All rights reserved.
8
Sources and Differences
when Designing
• Minimize number of source systems
• Carefully judge source data item quality
• Reconcile multiple sources
– Eg. Differences in timeframe and currency of item
• Handle compatibility of coding schemes for
data items
• Unpack overloaded attributes
– Eg. Address containing postcode where postcode
becomes an important part of a warehouse
Copyright: ©2005 by Elsevier Inc. All rights reserved.
9
Shaping Data for Data Marts
• Need to maximize flexibility
• Cater for common purposes between
marts and basic commonality (sorted out
when handing requirements for the
warehouse)
• If difficult to cater for both flexibility and
common purpose opt for flexibility
• The rule: Maximize Flexibility, Minimize
Anticipation
Copyright: ©2005 by Elsevier Inc. All rights reserved.
10
Modeling for Data Marts
• Modeling for general business people
– Little technical knowledge
– Need for special queries
• Much simpler than operational databases
– Facts vs. transaction handling and complex
business rules
• Users of data marts need to move easily
between marts
Copyright: ©2005 by Elsevier Inc. All rights reserved.
11
Basic Data Mart Architecture:
Star Schema
• Fact table (only one)
• Dimension tables
– To classify fact table into categories
Period
Customer
Accounting Month No
Quarter No
Year No
Sale
Accounting Month No *
Product ID *
Customer ID *
Location ID *
Quantity
Value
Product
Product ID
Product Type Code
Product Name
Customer ID
Customer Type Code
Region Code
State Code
Customer Name
Location
Location ID
Location Type Code
Region Code
State Code
Location Name
Copyright: ©2005 by Elsevier Inc. All rights reserved.
12
Alternative Architectures:
Snowflake
• One fact table
• Dimensions are hierarchical
– Collapse 1:many relationships through denormalization
Customer
Type
Period
Accounting Month No
Quarter No
Year No
Product
Type
Product Type ID
Product Type Name
Product
Product ID
Product Type ID
Product Name
Customer
Sale
Accounting Month No
Product ID
Customer ID
Location ID
Quantity
Value
Customer Type ID
Customer Type Name
Customer ID
Customer Type ID
Region ID
Customer Name
Location
Region
Region ID
State ID
Region Name
Location ID
Location Type ID
Region ID
Location Name
State
State ID
State Name
Location
Type
Location Type ID
Location Type Name
Copyright: ©2005 by Elsevier Inc. All rights reserved.
13
Snowflakes and Many-tomany Relationships
Cannot be handled without action:
1. Ignore less common cases
–
Salesperson
be
credited to
Product
be credited
with
be
classified by
classify
But include data in fact (eg. Include
#salespeople in fact table)
2. Use a repeating group in
dimension table
3. Treat sale-by-salesperson as the
fact table
Sale
Whatever you do, involve the
business users in decision making
about architecture
Copyright: ©2005 by Elsevier Inc. All rights reserved.
14
Time-dependent Data
• History and time are common in data marts
and you must be able to
– Handle different granularities of time
– Cater for overlapping periods
– Consider hierarchies of time periods
• Slowly changing dimensions are common
(eg. People may move customer categories
over time)
– Speed of dimension data change
– Speed of moving fact data from one dimension to
another
Copyright: ©2005 by Elsevier Inc. All rights reserved.
15
Dimension Change Example
Customer
Group
Customer
Purchase
• Customers can change group
• Solutions
– Two group foreign keys (now, and at time
of purchase / transaction)
– Ignore if change is slow and cost of
ignoring it is low
– Hold a history of each customer’s
membership of groups
Copyright: ©2005 by Elsevier Inc. All rights reserved.
16
Concluding Word
• Data warehousing and data marts are
complex
• Specific design challenges and
limitations exist
• Patterns are also useful here
– There are resources available
• Do further reading about the area if
you’re interested
Copyright: ©2005 by Elsevier Inc. All rights reserved.
17
Next lecture
• Ontology: What’s all the fuss?
– How ontology can and cannot help you?
– What role does underlying theory (such as
ontology) play in practical data modelling?
– If we have ontology what happens with
creativity?
Copyright: ©2005 by Elsevier Inc. All rights reserved.
18