DW Design - Computer Science
Download
Report
Transcript DW Design - Computer Science
Data Warehousing
Design
Dr. Awad Khalil
Computer Science Department
AUC
Data Warehousing Concepts, by Dr.
Khalil
1
Content
Designing a Data Warehouse Database
Dimensional Modeling
Star Schema
Snowflake Schema
Advantages of Dimensional Modeling
Methodology for Dimensional Modeling
Data Warehousing Concepts, by Dr.
Khalil
2
Designing a Data Warehouse Database
Designing a data warehouse database is highly complex.
The database component of a data warehouse is described using a
technique called dimensionality modeling: “A logical design
technique that aims to present the data in a standard, intuitive
form that allows for high-performance access”
Dimensionality modeling uses the concepts of Entity-Relationship
(ER) modeling with some important restrictions.
Every dimensional model (DM) is composed of one table with a
composite primary key, called the fact table, and a set of smaller
tables called dimension tables.
Every dimension table has a simple (non-composite) primary key
that corresponds exactly to one of the components of the
composite key in the fact table.
This characteristic ‘star-like’ structure is called a star schema or
star join.
Data Warehousing Concepts, by Dr.
Khalil
3
Star Schema
A logical structure that has a fact table containing factual data in the center,
surrounded by dimension tables containing reference data (which can be
denormalized).
The diagram shows a Star schema for property sales of a Real Estate database.
Data Warehousing Concepts, by Dr.
Khalil
4
Other Schema Versions
Snowflake Schema
A variant of the star schema where dimension tables do not contain
denormalized data.
Starflake Schema
A hybrid structure that contains a mixture of star and snowflake schemas.
The diagram shows part of star schema for property sales of a Real Estate
database with a normalized version of the Branch dimension table.
Data Warehousing Concepts, by Dr.
Khalil
5
Dimensional Model - Advantages
Efficiency – The consistency of the underlying database structure
allows more efficient access to the data by various tools including
report writers and query tools.
Ability to handle changing requirements – The start schema
can adapt to changes in the user’s requirements, as all dimensions
are equivalent in terms of providing access to the fact table.
Extensibility – The dimensional model is extensible.
Ability to model common business situations – There are a
growing number of standard approaches for handling common
modeling situations in the business world.
Predictable query processing – Data warehouse applications
that drill down will simply be adding more dimension attributes
from within a single star schema.
Data Warehousing Concepts, by Dr.
Khalil
6
Database Design Methodology for Data
Warehouse
Nine-Step Methodology by Kimball (1996):
1- Choosing the process
2- Choosing the grain
3- Identifying and conforming the dimensions
4- Choosing the facts
5- Storing pre-calculations in the fact table
6- Rounding out the dimension tables
7- Choosing the duration of the database
8- Tracking slowly changing dimensions
9- Deciding the query priorities and the query modes
Data Warehousing Concepts, by Dr.
Khalil
7
1- Choosing the process
The process (function) refers to the subject matter of a particular
data mart. The best choice for the first data mart tends to be the
one that is related to sales.
Data Warehousing Concepts, by Dr.
Khalil
8
2- Choosing the grain
Means deciding exactly what a fact table record represents.
Data Warehousing Concepts, by Dr.
Khalil
9
3- Identifying and Conforming the Dimensions
Dimensions set the context for asking questions about the facts in the fact table.
The diagram shows Star schema for property sales and property advertising
with Time, PropertyForSale, Branch, and Promotion as conformed (shared)
dimension tables.
Data Warehousing Concepts, by Dr.
Khalil
10
4- Choosing the Facts
The grain of the fact table determines which facts can be used in the data mart.
All the facts must be expressed at the level implied by the grain.
The diagram shows how the Lease fact table shown in the previous diagram
could be corrected so that the fact table is appropriately structured
Data Warehousing Concepts, by Dr.
Khalil
11
5- Storing Pre-Calculations in the Fact Table
Once the facts have been selected
each should be re-examined to
determine whether there are
opportunities to use precalculations.
A common example of the need to
store pre-calculations occurs
when the fact comprise a profit
and loss statement.
The diagram shows the fact table
with the rentDuration, totalRent,
clientAllowance,
staffCommission, and
totalRevenue attributes. These
types of facts are useful because
they are additive quantities, from
which we can derive valuable
information.
Data Warehousing Concepts, by Dr.
Khalil
12
6- Rounding out the Dimension Tables
In
this step, we return to the dimension tables and add
many text descriptions to the dimensions as possible.
The
text descriptions should be as intuitive and
understandable to the users as possible.
The
usefulness of a data mart is determined by the scope
and nature of the attributes of the dimension tables.
Data Warehousing Concepts, by Dr.
Khalil
13
7- Choosing the Duration of the Database
The
duration measures how far back in time the fact table
goes.
Very large fact tables raise at least two very significant
design issues:
First, it is often increasingly difficult to source
increasingly old data.
Second, it is mandatory that the old versions of the
important dimensions be used, not the most current
versions. This is known as the ‘slowly changing
dimension’ problem’.
Data Warehousing Concepts, by Dr.
Khalil
14
8- Tracking Slowly Changing Dimensions
The slowly changing dimension problem means, for example, that
the proper description of the old client and the old branch must be
used with the old transaction history.
Often, the data warehouse must assign a generalized key to these
important dimensions in order to distinguish multiple snapshots of
clients and branches over a period of time.
There are three basic types of slowly changing dimensions:
Type 1 – where a changed dimension attribute is overwritten;
Type 2 – where a changed dimension attribute causes a new
record to be created;
Type 3 – where a changed dimension attribute causes an
alternate attribute to be created so that both the old and the new
values of the attribute are simultaneously accessible in the same
dimension record. Data Warehousing Concepts, by Dr.
Khalil
15
9- Deciding the Query Priorities and the Query
Modes
In this step we consider physical design issues.
The most critical physical design issues affecting the end-user’s
perception of the data mart are the physical sort order of the fact
table on disk and the presence of pre-stored summaries or
aggregations.
Beyond these issues there are a host of additional physical design
issues affecting administration, backup, indexing performance, and
security.
Data Warehousing Concepts, by Dr.
Khalil
16
Example- Dimensional Model (Fact Constellation) for a Real Estate Data Warehouse
At the end of this methodology, we
have a design for a data mart that
supports the requirements of a
particular Real Estate business is
designed for a Real Estate business
process and also allows the easy
integration with other related data
marts to ultimately form the
enterprise-wide data warehouse.
We integrate the star schemas for
the business processes of the Real
Estate company using the
conformed dimensions. For
example, all the fact tables share
the Time and Branch dimensions.
A dimensional model, which
contains more than one fact table
sharing one or more conformed
dimension tables, is referred to as a
fact constellation.
Data Warehousing Concepts, by Dr.
Khalil
17
Example- Fact and Dimension Tables for each Business Process
Business Process Fact Table
Dimension Tables
Property Sales
PropertySale
Time, Branch, Staff,
PropertyForSale,
Owner,ClientBuyer, Promotion
Property Rentals
Lease
Time, Branch, Staff,
PropertyForSale,
Owner,ClientBuyer, Promotion
Property Viewing
PropertyViewing
Time, Branch, Staff,
PropertyForSale,
PropertyForRent,ClientBuyer,
ClientRenter
Property
Advertising
Advert
Time, Branch, Staff,
PropertyForSale, PropertyForRent,
Promotion, Newspaper
Property
Maintenance
PropertyMaintenanc Time, Branch, Staff,
e
PropertyForRent
Data Warehousing Concepts, by Dr.
Khalil
18
Thank you
Data Warehousing Concepts, by Dr.
Khalil
19