Data Warehouse Architecture

Download Report

Transcript Data Warehouse Architecture

Data Warehouse Architecture
By
Slavko Stemberger
Copyright © Starsoft
Inc, 2000
1
Some Acronyms/Terms
• OLTP
– On-Line Transaction Processing (operational
system)
• OLAP
– On-line Analytical Processing
• ROLAP
– Relational OLAP
Copyright © Starsoft
Inc, 2000
2
Some Acronyms/Terms
• MOLAP
– Multidimensional OLAP
• Metadata
– Data about data (data dictionary)
• Source System
– An operational system that provides data for the
data warehouse
Copyright © Starsoft
Inc, 2000
3
Some Acronyms/Terms
• ETL
– Extract, Transform and Load
• Data Warehouse
– A queryable source of data
• Data Mart
– A logical subset of a data warehouse
• Data Staging Area
– An intermediate storage location used for ETL
Copyright © Starsoft
Inc, 2000
4
Data Structures/Databases
•
•
•
•
•
•
Flat Files
Hierarchical DB
Network DB
Relational DB
O-O DB
Dimensional DB
Copyright © Starsoft
Inc, 2000
5
Modeling Methods
• Entity-Relationship (E-R)
• Dimensional
• Object Oriented (O-O)
Copyright © Starsoft
Inc, 2000
6
Entity-Relationship Modeling
• Used for operational system
• Instantaneous snapshot of the business
• Removed data redundancy (eliminates
update anomalies)
• Shows detail relationships
• Complex network of entities can be difficult
for end-users to understand
Copyright © Starsoft
Inc, 2000
7
Dimensional Modeling
• Used in data warehouses
• Data duplication is allowed (in the
dimensions)
• Query based
• Easier for users to understand
– Not as much detail shows as in E-R
Copyright © Starsoft
Inc, 2000
8
Dimensional Models
• The “Cube”
• Star Schema
• Snowflake Schema
Copyright © Starsoft
Inc, 2000
9
The “Cube”
• Logical structure of ALL data
warehouses
• Can be implemented physically
in an RDB like Oracle
• Some view this as limited to
data marts
Copyright © Starsoft
Inc, 2000
10
Star Schema
• Easy to understand
• Flexible in type of questions that
can be asked
• Supports very large data
warehouses
• There is data redundancy (in the
dimensions)
Copyright © Starsoft
Inc, 2000
11
Snowflake Schema
• “Normalized” star schema
• More complex than the
star schema - harder to
understand and work with
• Solves some problems that
cannot be done with star
schema
Copyright © Starsoft
Inc, 2000
12
Dimension Tables
• A set of independent variables that affect an
observation
• Each variable has a set of known, relatively
small, set of values
• 4 - 20 dimensions per data warehouse/data
mart is the norm
Copyright © Starsoft
Inc, 2000
13
Dimension Tables (cont…)
• Columns are descriptive and usually textual
• Some numeric values are descriptive
– Numeric descriptive values should be suspect
of being facts e.g. standard product price may
be a fact because it can change and one can ask
“what was the average standard price of the
product over the last 12 months”
Copyright © Starsoft
Inc, 2000
14
Dimension Tables (cont…)
• Dimension keys should be meaningless
surrogate keys
• Time dimension keys may be/should be
assigned in the order of the dates in the fact
table - this allows physical partitioning
• In general avoid “smart” keys - they should
be meaningless
• Avoid production keys
Copyright © Starsoft
Inc, 2000
15
Dimension Tables - Granularity
• Definition: The level of detail of the data
• Keep the grain of the data as small as
possible (as detail as possible)
– This makes the warehouse more resistant to
change
– It is easier to add attributes to existing
dimensions
– superior results in data mining operations
Copyright © Starsoft
Inc, 2000
16
Dimension Tables - “Types”
• Time
• Degenerate
• “Junk”
• Other
Copyright © Starsoft
Inc, 2000
17
Dimension Tables - Time
• All data marts and warehouses have at least
one time dimension
• Must be consistent across all fact tables
• Create partial attributes year, month and day
and their concatenations (year + month,
year + month + day, year + week, …)
– Without the concatenations, it is difficult to ask
for time ranges
Copyright © Starsoft
Inc, 2000
18
Dimension Tables - Degenerate
• Dimensions with only one attribute
• Usually a control document id such as order
number, invoice number, etc
• No value in creating a physical table
• Put the id into the fact table
Copyright © Starsoft
Inc, 2000
19
Dimension Tables - “Junk”
• Given: Leftover flags and text attributes
• Possible Actions:
–
–
–
–
Put the these flags into the fact table
Make each one into a dimension
Drop them from the design
Create one dimension with all combinations of
these flags
Copyright © Starsoft
Inc, 2000
20
Fact Tables
• Dimension keys
• Degenerate dimension keys (if they exist)
• Facts
–
–
–
–
Additive
Semi-additive
Non-additive
None (factless tables)
Copyright © Starsoft
Inc, 2000
21
Facts - Additive
• These are measures of activity
• Can be added across all combination of
dimensions
• Examples: sales in dollars or units
Copyright © Starsoft
Inc, 2000
22
Facts - Semi-additive/nonadditive
• These are measures of intensity
• Some may be added across some
dimensions but not others
– e.g. Bank Balance
• Some may not be added at all
– e.g. Temperature
Copyright © Starsoft
Inc, 2000
23
Closing
• Other things to look at
–
–
–
–
–
–
Mutating dimensions
Hierarchical data (e.g. product structures)
Security
Data Loading
Cleansing
etc.
Copyright © Starsoft
Inc, 2000
24