DATA WAREHOUSING
Download
Report
Transcript DATA WAREHOUSING
DATA WAREHOUSING
Legacy System
Systems that were developed in the early
years of business processing
Rich source of historical data, but it’s
difficult to retrieve, because of nonstandard features
This is why we need data warehouse
Problems with Legacy System
Access data from a legacy system may be
difficult for several reasons:
Developed for a different hardware or software
platform
Use a different data model
Use a different DBMS
Use a different data definitions
Use a different data format
All these make difficulty in integration and
sharing data
Data Definitions Problems
Homonyms – use different field names to store the
same data in the different database
Synonyms - use the same field names to store
different data in the different database
Domain integrity – domain for the same field may
be different
Business rules – may be different in different
database
Referential integrity – may be problems linking
related records from different databases
Concurrency control – when multiple users access a
database that design for single user
Data Warehouse Concepts
Technique of extracting and filtering data
from diverse database and use this data
to build a new database
Stores information extracted from
historical, operational and external
databases
The primary purpose : to provide
information for management decision
making
Database vs data warehouse
Activity
Database
Data warehosue
Function
Support
business
operation
Support decision
making
Data
Process oriented Subject-oriented
Usage
Structured,
repetitive
Unstructured,
repetitive
Processing
Data entry
End user initiated
queries
Data Warehouse Architecture
Operational database / external database
layer
Information access layer
Data access layer
Metadata layer
Process management layer
Application messaging layer
Physical layer
Data staging layer
Data Warehouse Implementation
Data – includes operational, historical and
external data
Extraction and transformation – extract and
transform data in different table
Data warehouse storage – store the extracted
and transformed data in different table
Historical data – used for forecasting purposes
Reports, statistics, data analysis and presentation
– output from data warehouse to make a decision
Data Warehouse : Benefits and
Risks
Benefits :
Reduces reporting
cost
Reduces data
consolidation and
integration cost
Increase efficiency
and decision
making
capabilities
Risks
House the wrong data
Expensive to build and
maintain
Require organizational
changes
Online Analytical Processing
Support data modeling and
multidimensional data analysis
Share the characteristics :
Provide user-friendly interface
Use multidimensional data analysis technique
Provide advanced database support
Support client/server architecture
Online Analytical Processing
Can be classified :
Relational Online Analytical Processing – use
RDBMS
Multidimensional Online Analytical Processing –
extension of RDBMS
Data Mining
Data mining is a decision support tools
that enables a user to access directly large
amount of data and analyzes the data
Data mining is the set of activities used to
find new, hidden, or unexpected patterns
in data
Data Mining Technique
Data mining process has four phases :
Data preparation – main data sets to be used
are identified and cleaned
Data analysis and classification – identify
common data characteristic or pattern
Knowledge acquisition – develop a model
resemble target data
Prediction – used to predict future behaviour
and forecast business outcomes
Data Mining Tools
Data mining tools today has this following
characteristics :
Data preparation facilities
Selection of data mining operations
Product scalability and performance
Facilities for visualization of results
END