Data Warehouse Logical Design

Download Report

Transcript Data Warehouse Logical Design

Logical Design of a Data Warehouse
Sayed Ahmed
SOFTWARE/WEB/MOBILE/DATABASE ARCHITECT, ENGINEER, AND DEVELOPER
TORONTO, CANADA
HTTP://SAYED.JUSTETC.NET
HTTP://WWW.JUSTETC.NET
OUR SERVICES

Free Training and Educational Services
 Training
and Education in Bangla:
 Bangla.SaLearningSchool.com
 Training
and Education in English:
 www.SaLearningSchool.com
 English.SaLearningSchool.com
 Ask
a question and get answers:
 Ask.JustEtc.net
TOPICS - KEYWORDS

Design a Data Warehouse
Star Schema
 Snow Flake Schema
 Dimension Tables
 Fact Tables
 Auditing
 Surrogate Keys
 Type 1, Type 2, Type 3, and Mixed solutions for slowly
changing dimension data ( SCD management)
 Pivoting for Analysis

 To
help with SSAS on data warehouse
TOPICS - KEYWORDS

Design a Data Warehouse

Additive measures
 Semi

additive measures
Hierarchies for dimensions
 Attributes
in dimensions
 Attributes in lookup tables

Long term data warehouse design
 Usually

Star Schema
Short term data warehouse design
 POC
 Usually
snowflake schema
TOPICS - KEYWORDS

Fact Tables







measures
foreign keys
and possibly an additional primary key
and lineage columns
granularity of fact tables
auditing and lineage needs
Measures can be



additive
non-additive
semi-additive
TOPICS - KEYWORDS

dimension
 keys
 names
 attributes
 member
properties
 translations
 and lineage
TOPICS - KEYWORDS

attributes
 natural

hierarchies
many-to-many fact table relationships
 you
can introduce an additional intermediate
dimension
CONCLUSION
Not much – right
 However, if you understand all the terms and can
implement all these concepts in your data
warehouse

That will be great
 Not necessarily you will need to use all of these
concepts; however, you may need to justify based on
the situation, will all or any of these will help?

 What

will help and what will not help
Check our sub sequent videos and tutorials
THANK YOU

Any Concerns?
 http://ask.justetc.net
 Or
comment below...
TOOLS AND SOFTWARE REQUIREMENTS

Download the Adventure Works databases



OLTP database (LOB database)
Data warehouse Database
From



http://msftdbprodsamples.codeplex.com/releases/view/55330
For this tutorial, you can just check our slides
Though the following tools will help


And probably check the details in the downloaded
databases esp. The AdventureWorksDW2012
You will need help from SQL Server and SQL Server MGMT
Studio Tools
REQUIRED TOOLS

Useful/Required SQL Server Components
 Database
Engine Services
 Documentation Components
 Management Tools - Basic
 Management Tools – Complete
 SQL Server Data Tools
DATA WAREHOUSE DESIGN – THE DETAILS

Data Warehouse Logical Design

Topics: Design and Implement a Data Warehouse
 Design
and implement dimensions.
 Design and implement fact tables
 Design Auditing


track the source and time for data coming into a DW through
auditing i.e lineage information
Why a Data Warehouse?

It is hard to
 generate
reports from OLTP/LOB/Transactional database
 To do Analysis on OLTP database data (some times)
 Get useful information/useful summarized and details data to
be used to take business decisions
DATA WAREHOUSE DESIGN – THE DETAILS

Why a Data Warehouse?
Data in OLTP are heavily normalized. The goal was to
keep one data only in one single place to reduce
redundancy and consistency of data
 You may end up with many tables 100s, 1000s
 To generate reports you may need to join many tables –
will be slow
 Historical data may not be there
 Data quality is also an issue
 For reporting or analyzing, you may need data from
multiple databases across many departments

WHY A DATA WAREHOUSE?

So you can create a Data Warehouse
By cleaning data
 With historical data
 Combining data from multiple sources
 Denormalizing data
 Using specific design geared towards Data Warehouse
design

 Some
or many consider DW design is less complex than
relational database design

Though it also has some complex areas to address... (by those
some or many)
SO WHAT DOES A DATA WAREHOUSE CONTAIN?

Usually two schemas are used for a DW
 Star
Schema-> looks like a star
 Snow Flake Schema

Another one called Dimensional Model
 Includes
both Star and Snow Flake in the same
Data Warehouse

Both Schemas has tables of two types
 Dimension
 Fact
Tables
Tables
SO WHAT DOES A DATA WAREHOUSE CONTAIN?

Fact Tables are in the center

A Fact table joins/combines all the data required for
this reporting or for the business aspect of this
reporting
 Usually
combines the primary keys of different tables that
contain data for this report/business aspect

Dimension tables are all the other tables that
contain actual data

Dimension tables are the tables that contain data
these can be the actual tables in the OLTP database without
any modification (Snow Flake)
 Or Dimension tables can be newly created by denormalizing
the existing OLTP databases (Star)

SO WHAT DOES A DATA WAREHOUSE CONTAIN?

So, you know now what are dimension tables and
what are fact tables
Fact tables contain primary keys of all related tables
(here they are foreign keys)
 Dimension tables contain data


Usually, it’s better that you keep your data
warehouse separate from your OLTP database
So bring all the tables (dimension) here
 Or denormalize them and bring them here in the new
database

SIMPLIFIED: WHAT ARE STAR AND SNOWFLAKE SCHEMAS

If you just create Fact tables and take all the
related tables from your OLTP/LOB databases
You get a Snow Flake Schema
 Here all Dimension tables are still normalized (as you
just took them from the actual database)
 This is easy –

 so
good for short-term, quick, and experimental Data
Warehouse

One note, your reporting and analysis services queries
(MDX, DMV) will be slow with Snow Flake Schemas
SO WHAT DOES A DATA WAREHOUSE CONTAIN?

Now, when you denormalize the dimension
tables
 You
get the start schema
 The Fact tables remain the same for example

Star Schema is kind of standard and used a lot
 Originally
was developed in 1980’s
EXAMPLES: WHY REPORTING IN OLTP DATABASE IS NOT A GREAT IDEA
Sales amount for internet sales by different countries and historical years
WHY REPORTING IN OLTP DATABASE IS NOT A GREAT IDEA

issues that I did not mention before
 If
your OLTP database was well designed (?)
 It
may be hard to find the tables related to the reporting
 The table names and the column names can be tricky –
do not follow any conventions – do not have meaning
 So it can be hard to find data for the reporting
WHY REPORTING IN OLTP DATABASE IS NOT A GREAT IDEA

Note: Reality:
 The
OLTP may not even be well designed (that makes
reporting hard sometimes) even the relationships as well as
normalization


– here we assumed that OLTP is perfect
In a long back project


I had to re-write/verify/check/change/optimize/had to deal with
(whatever you say) 100s (not really 100s, can be close to 100) of
queries for a reporting system
 Had to change the interface from one button for one report (easy
to get lost)
 Into a drop down list of reports
The relations among data were arbitrary – actually had only in the
mind of the designer – did not follow any standards – No ER – no
standard concepts-- So it was a hard job..
 Anyway..
WHY REPORTING IN OLTP DATABASE IS NOT A GREAT IDEA

In such cases
Tools such as SQL Profiler might help
 you could create a test environment,

 try
to insert some data through an LOB application
 have SQL Profiler identify where the data was inserted

Another, issue with this particular example

No lookup for dates and years
 You

need to extract
The tables may not contain even historical data
 No
date field
 So no historical data
WHY REPORTING IN OLTP DATABASE IS NOT A GREAT IDEA

If sales data reside in multiple databases even by
multiple departments




How do you merge
Identify and match
Customer data can be in different database with no
common identification
Data quality can be low
Data missing
 Partial data
 Inconsistent data in multiple databases
 Data can be represented differenlt in different database



M or F for gender
1 or 0 for gender
STAR SCHEMA/FACT/DIMENSION/CUBE
TOTAL DW: MULTIPLE STAR SCHEMAS






You saw one Star Schema for Internet Sales
You can see another for Offline Sales
Another for Accounting
Your DW has many such Star Schemas
And these start schemas need to be connected/related
They will be connected when you use the same dimensions
for them

i.e. If two star schemas have the same dimension they can share
that dimension

Called: shared or conformed dimensions


For SSAS, you can use shared dimensions only
There is a concept of private dimension


Not a great idea in practical and real life applications
You cannot connect/compare/verify the data over the shared dimension
SHARED/CONFORMED DIMENSIONS
DENORMALIZED DIMDATE TABLE
SNOW FLAKES WILL BE MORE AND MORE NORMALIZED
Everything can be normalized
 Or the first level can be normalized others are
not

NORMALIZED PRODUCT DIMENSION
In the Star Schema, you could use these normalized product table to get snow flake
schema (partially.) Could use all normalized dimensions to get full snow flake
SNOW FLAKE
In Snow flake, you may see partial than full
snow flakes in reality
 Though, in reality, better to go for star schema

 Queries
will be faster
PARTIAL SNOW FLAKE
GRANULARITY

The number of Dimension Tables connected to
a fact table
 Dimension
of a star schema
 Cube = 3 dimension
 SSAS operates/analyzes on Cube
AUDITING AND LINEAGE
I will be very short on this
 In data warehouse, you may want some
auditing tables

 For
every update, you should audit
 who
made the update,
 when it was made,
 and how many rows were transferred
to each dimension and
 fact table

 in
your DW
AUDITING AND LINEAGE

You will need additional fields/columns in your
dimension and fact tables to track
 When,
and who, and from where the row data
was/were updated
 Your ETL process needs to be updated
 If you used SSIS for the ETL
 Modify
SSIS packages so that you can record these
information
THANK YOU

Any Concerns?
 http://ask.justetc.net
 Or
comment below...