Data Warehousing

Download Report

Transcript Data Warehousing

Chapter 9
DATA WAREHOUSING
Transparencies
© Pearson Education Limited 1995, 2005
Chapter 9 - Objectives
 Legacy
System
 How data warehousing evolved.
 The main concepts and benefits associated with
data warehousing.
 OLAP
 Data mining
Legacy System
 Systems
that were developed in the early years of
business processing
 Rich source of historical data, but it’s difficult to
retrieve, because of non-standard features
 This is why we need data warehouse
Problems with Legacy System

Access data from a legacy system may be difficult for
several reasons:
– Developed for a different hardware or software
platform
– Use a different data model
– Use a different DBMS
– Use a different data definitions
– Use a different data format

All these make difficulty in integration and sharing data
Data Definitions Problems






Homonyms – use different field names to store the same data in
the different database
Synonyms - use the same field names to store different data in
the different database
Domain integrity – domain for the same field may be different
Business rules – may be different in different database
Referential integrity – may be problems linking related records
from different databases
Concurrency control – when multiple users access a database that
design for single user
Data Warehouse Concepts
Technique of extracting and filtering data from
diverse database and use this data to build a new
database
 Stores information extracted from historical,
operational and external databases
 The primary purpose : to provide information for
management decision making

Database vs data warehouse
Activity
Database
Data warehosue
Function
Support business
operation
Support decision making
Data
Process oriented
Subject-oriented
Usage
Structured,
repetitive
Unstructured, repetitive
Processing
Data entry
End user initiated queries
Data Warehouse Architecture
Operational database / external database layer
 Information access layer
 Data access layer
 Metadata layer
 Process management layer
 Application messaging layer
 Physical layer
 Data staging layer

Data Warehouse Implementation





Data – includes operational, historical and external data
Extraction and transformation – extract and transform data
in different table
Data warehouse storage – store the extracted and
transformed data in different table
Historical data – used for forecasting purposes
Reports, statistics, data analysis and presentation – output
from data warehouse to make a decision
Data Warehouse : Benefits and Risks

Benefits :
– Reduces reporting
cost
– Reduces data
consolidation and
integration cost
– Increase efficiency
and decision
making capabilities

Risks
– House the wrong data
– Expensive to build and
maintain
– Require organizational
changes
Online Analytical Processing
Support data modeling and multidimensional data
analysis
 Share the characteristics :
– Provide user-friendly interface
– Use multidimensional data analysis technique
– Provide advanced database support
– Support client/server architecture

Online Analytical Processing

Can be classified :
– Relational Online Analytical Processing – use
RDBMS
– Multidimensional Online Analytical Processing
– extension of RDBMS
Data Mining
Data mining is a decision support tools that
enables a user to access directly large amount of
data and analyzes the data
 Data mining is the set of activities used to find
new, hidden, or unexpected patterns in data

Data Mining Technique

Data mining process has four phases :
– Data preparation – main data sets to be used are
identified and cleaned
– Data analysis and classification – identify
common data characteristic or pattern
– Knowledge acquisition – develop a model
resemble target data
– Prediction – used to predict future behaviour
and forecast business outcomes
Data Mining Tools

Data mining tools today has this following
characteristics :
– Data preparation facilities
– Selection of data mining operations
– Product scalability and performance
– Facilities for visualization of results
 END