DATA WAREHOUSING

Download Report

Transcript DATA WAREHOUSING

DATA WAREHOUSING
Legacy System
Systems that were developed in the early
years of business processing
 Rich source of historical data, but it’s
difficult to retrieve, because of nonstandard features
 This is why we need data warehouse

Problems with Legacy System


Access data from a legacy system may be
difficult for several reasons:
 Developed for a different hardware or software
platform
 Use a different data model
 Use a different DBMS
 Use a different data definitions
 Use a different data format
All these make difficulty in integration and
sharing data
Data Definitions Problems






Homonyms – use different field names to store the
same data in the different database
Synonyms - use the same field names to store
different data in the different database
Domain integrity – domain for the same field may
be different
Business rules – may be different in different
database
Referential integrity – may be problems linking
related records from different databases
Concurrency control – when multiple users access a
database that design for single user
Data Warehouse Concepts
Technique of extracting and filtering data
from diverse database and use this data
to build a new database
 Stores information extracted from
historical, operational and external
databases
 The primary purpose : to provide
information for management decision
making

Database vs data warehouse
Activity
Database
Data warehosue
Function
Support
business
operation
Support decision
making
Data
Process oriented Subject-oriented
Usage
Structured,
repetitive
Unstructured,
repetitive
Processing
Data entry
End user initiated
queries
Data Warehouse Architecture
Operational database / external database
layer
 Information access layer
 Data access layer
 Metadata layer
 Process management layer
 Application messaging layer
 Physical layer
 Data staging layer

Data Warehouse Implementation





Data – includes operational, historical and
external data
Extraction and transformation – extract and
transform data in different table
Data warehouse storage – store the extracted
and transformed data in different table
Historical data – used for forecasting purposes
Reports, statistics, data analysis and presentation
– output from data warehouse to make a decision
Data Warehouse : Benefits and
Risks

Benefits :
 Reduces reporting
cost
 Reduces data
consolidation and
integration cost
 Increase efficiency
and decision
making
capabilities

Risks
 House the wrong data
 Expensive to build and
maintain
 Require organizational
changes
Online Analytical Processing
Support data modeling and
multidimensional data analysis
 Share the characteristics :





Provide user-friendly interface
Use multidimensional data analysis technique
Provide advanced database support
Support client/server architecture
Online Analytical Processing

Can be classified :


Relational Online Analytical Processing – use
RDBMS
Multidimensional Online Analytical Processing –
extension of RDBMS
Data Mining
Data mining is a decision support tools
that enables a user to access directly large
amount of data and analyzes the data
 Data mining is the set of activities used to
find new, hidden, or unexpected patterns
in data

Data Mining Technique

Data mining process has four phases :




Data preparation – main data sets to be used
are identified and cleaned
Data analysis and classification – identify
common data characteristic or pattern
Knowledge acquisition – develop a model
resemble target data
Prediction – used to predict future behaviour
and forecast business outcomes
Data Mining Tools

Data mining tools today has this following
characteristics :




Data preparation facilities
Selection of data mining operations
Product scalability and performance
Facilities for visualization of results

END