Data Warehousing
Download
Report
Transcript Data Warehousing
Chapter 9
DATA WAREHOUSING
Transparencies
© Pearson Education Limited 1995, 2005
Chapter 9 - Objectives
Legacy
System
How data warehousing evolved.
The main concepts and benefits associated with
data warehousing.
OLAP
Data mining
Legacy System
Systems
that were developed in the early years of
business processing
Rich source of historical data, but it’s difficult to
retrieve, because of non-standard features
This is why we need data warehouse
Problems with Legacy System
Access data from a legacy system may be difficult for
several reasons:
– Developed for a different hardware or software
platform
– Use a different data model
– Use a different DBMS
– Use a different data definitions
– Use a different data format
All these make difficulty in integration and sharing data
Data Definitions Problems
Homonyms – use different field names to store the same data in
the different database
Synonyms - use the same field names to store different data in
the different database
Domain integrity – domain for the same field may be different
Business rules – may be different in different database
Referential integrity – may be problems linking related records
from different databases
Concurrency control – when multiple users access a database that
design for single user
Data Warehouse Concepts
Technique of extracting and filtering data from
diverse database and use this data to build a new
database
Stores information extracted from historical,
operational and external databases
The primary purpose : to provide information for
management decision making
Database vs data warehouse
Activity
Database
Data warehosue
Function
Support business
operation
Support decision making
Data
Process oriented
Subject-oriented
Usage
Structured,
repetitive
Unstructured, repetitive
Processing
Data entry
End user initiated queries
Data Warehouse Architecture
Operational database / external database layer
Information access layer
Data access layer
Metadata layer
Process management layer
Application messaging layer
Physical layer
Data staging layer
Data Warehouse Implementation
Data – includes operational, historical and external data
Extraction and transformation – extract and transform data
in different table
Data warehouse storage – store the extracted and
transformed data in different table
Historical data – used for forecasting purposes
Reports, statistics, data analysis and presentation – output
from data warehouse to make a decision
Data Warehouse : Benefits and Risks
Benefits :
– Reduces reporting
cost
– Reduces data
consolidation and
integration cost
– Increase efficiency
and decision
making capabilities
Risks
– House the wrong data
– Expensive to build and
maintain
– Require organizational
changes
Online Analytical Processing
Support data modeling and multidimensional data
analysis
Share the characteristics :
– Provide user-friendly interface
– Use multidimensional data analysis technique
– Provide advanced database support
– Support client/server architecture
Online Analytical Processing
Can be classified :
– Relational Online Analytical Processing – use
RDBMS
– Multidimensional Online Analytical Processing
– extension of RDBMS
Data Mining
Data mining is a decision support tools that
enables a user to access directly large amount of
data and analyzes the data
Data mining is the set of activities used to find
new, hidden, or unexpected patterns in data
Data Mining Technique
Data mining process has four phases :
– Data preparation – main data sets to be used are
identified and cleaned
– Data analysis and classification – identify
common data characteristic or pattern
– Knowledge acquisition – develop a model
resemble target data
– Prediction – used to predict future behaviour
and forecast business outcomes
Data Mining Tools
Data mining tools today has this following
characteristics :
– Data preparation facilities
– Selection of data mining operations
– Product scalability and performance
– Facilities for visualization of results
END