Data Warehouse

Download Report

Transcript Data Warehouse

Data Warehouse
IMS5024 – presented by Eder Tsang
Data Warehouse

A data warehouse is a system consisting of processes
and databases used to provide the “data infrastructure”
for EIS and DSS

“… a subject-oriented, integrated, timevariant, and nonvolatile collection of data in support of management’s
decisions”
Inmon and Hackathorn (1994)
Data warehouse - subject oriented

The data warehouse is organised by “data subjects” that
are relevant to the organisation.
– Customer, claim, shipment, product

This may be contrasted with the process orientation of
many OLTP systems
Data warehouse - integrated

Data in the warehouse is structured based on a corporatewide model, spanning the functional boundaries of legacy
systems

This includes naming standards, units of measurement and
periodicity
Data warehouse - time variant

Data is the data warehouse is characterised by the timeseries nature of historical data

The data consists of a series of “snapshots” which are
time-stamped and record values at a moment in time

This supports trend analysis of the data
Data warehouse - non volatile

The data warehouse is not continuously updated (inserts,
eletes and changes) like data in an OLTP system

Data in a data warehouse is periodically up-loaded at a
scheduled time intervals (say daily)
Motivations for data warehousing

Demands on OLTP data bases for query processing
would be too great

Data warehousing is designed for efficient retrieval

Data in legacy systems is frequently inconsistent, of poor
quality and stored in different formats

Reduce costs in providing data for decision making
Motivations for data warehousing

Support for focus on complete business processes (BPR)

Support for new initiatives
– CRM, Balanced Scorecard

Industry sources quote ROI’s averaging 401% over 3
years

Remain competitive
Typical Data Warehouse Architecture
An Actual Data Warehouse
Data warehouse development

Requirements identification

Logical design, data modelling

Data extract, transform and load (ETL)

Warehouse architecture , technology and tools

Physical database design

Delivery systems

Operational policies
Designing a data warehouse –
data design
There are two main approaches to data modelling or data
warehouse design
– entity relationship modelling and normalisation
– dimensional modelling
The design of databases using a
traditional E-R approach

Entities and relationships

Normalisation 3NF
Entity relationship schema
Why do we normalise data?

Normalisation is a process for converting complex data
structures into simple, stable data structures

Normalisation protects integrity of database by avoiding
anomalies (update, delete, create)
• Normalised data models are:
• robust and stable
• have minimum redundancy
Dimensional Modeling (star schema)
Dimensional Modeling (star schema)
Components of dimensional model:
–
Fact Tables : contain measurements of business
eg. Sales, purchase order, shipment
–
Dimension Tables : store the descriptions of the dimensions
of the business
eg. Product, customer, vendor, store
Dimensional Modeling (star schema)

Each dimension table has a single primary key that
corresponds exactly to one of the components of the
multipart key in the fact table.

A fact table always expresses a many to many relationship
(the key is composed of foreign keys

The most useful facts in a fact table are numeric and
additive
( typically values are added up)
Snowflake schema

Snowflake schema –all the tables are normalised

Star schemas are preferable to snowflake – fewer joins for
information retrieval
Dimensional Modelling vs
E-R modelling

the purpose of dimensional modelling structure data for easy
and efficient analysis

E-R
modelling
creates
a
single
required
to
support
organisation’s Whereas

DM creates individual models for business/decision interest
eg. • model for sales info
• model for Inventory info
Entity relationship schema (3NF)
Corresponding Star schema
Corresponding snowflake schema
Dimensional Modelling vs
E-R modelling (Con’t)

OLTP and DW have different purpose: – operational vs
informational

Normalisation protects integrity of database by avoiding
anomalies (update, delete, create)

Data models for data warehouse do not have to be normalised
– In contrast, data in DW does not change often
– periodic additions of new data
DM vs. E-R modeling debate
(Kimball’s view)

OLTP systems are volatile – high rates of update transactions

In normalised models the goal is to reduce data redundancy
and prevent update anomalies

Data in a data warehouse does not need to be normalised
because it is periodically refreshed not updated by user
transactions