Introduction to Data Warehousing

Download Report

Transcript Introduction to Data Warehousing

Introduction to
Data Warehousing
From DBMS to Decision Support
• DBMSs widely used to maintain transactional
data
• Attempts to use of these data for analysis,
exploration, identification of trends etc. has led
to Decision Support Systems.
• Rapid Growth since mid 70’s
• DBMSs vendors have answered this trend by
adding new features to existing products
• Rarely enough
DBs for Decision Support
• Trend towards Data Warehousing
• Data Warehousing – consolidation of data
from several databases which are in turn
maintained by individual business units
along with historical and summary
information
Characteristics of TPSs
Characteristic
OLTP
Typical operation
Update
Level of analytical requirements
Low
Screens
Unchanging
Amount of data per transaction
Small
Data level
Detailed
Age of data
Current
Orientation
Records
TPS vs Decision Support
OLTP
Complex Analysis
Information to support
Historical information
day-to-day service
to analyze
Data stored at transaction
Data needs to be integrated
level
Database design: Normalized
Database design:
Denormalized, star schema
MIS and Decision Support
Ad hoc access
Production
platforms
Operational reports
•
•
•
•
Decision makers
MIS systems provided business data
Reports were developed on request
Reports provided little analysis capability
no personal ad hoc access to data
Analyzing Data from
Operational Systems
ERP
• Data structures are complex
• Systems are designed for high performance and
throughput
• Data is not meaningfully represented
• Data is dispersed
• TPS systems unsuitable for intensive queries
Production
platforms
Operational reports
Data Extract Processing
Operational systems
Extracts
Decision makers
• End user computing offloaded from
the operational environment
• User’s own data
Management Issues
Operational systems
Extracts
Extract explosion
•
•
•
•
Duplicated effort
Multiple technologies
Obsolete reports
No metadata
Decision makers
Data Quality Issues
•
•
•
•
•
•
•
•
•
No common time basis
Different calculation algorithms
Different levels of extraction
Different levels of granularity
Different data field names
Different data field meanings
Missing information
No data correction rules
No drill-down capability
From Extract to Warehouse DSS
Internal and
external systems
•
•
•
•
Data warehouse
Decision makers
Controlled
Reliable
Quality information
Single source of
data
Data Warehousing Architecture
External Data Sources
Visualisation
Extract Clean
Transform Load
Metadata
respository
Refresh
Operational Databases
Data Warehouse
Serves
OLAP
Data Mining
Business Motivators
• Provide superior services and
products
• Know the business
• New products
• Invest in customers
• Retain customers
• Invest in technology
• Reinvent to face new challenges
Centralised data warehouse
Corporat e
dat awarehous e
Mainframe
Corporat e
Financial
Market ing
Manufact uring
Dis tribut ion
Anal ys t
Server
Anal ys t
Anal ys t
Federated data warehouse
Main frame
Corp orat e
d at a
wareh o us e
Fin an ci al
An al ys t
Mark et in g
An al ys t
Manu fact urin g
An al ys t
Dis trib ut io n
An al ys t
Tiered data warehouse
Mainframe
Tier 3 (detail ed dat a)
Corporat e data warehous e
Tier 2 (summari zed data)
Tier 1 (hi ghly s ummarized data)
Local dat a mart
Workst at ion
Anal ys t
Data Warehouses Vs Data
Marts
Data
Warehouse
Data Mart
Property
Data Warehouse
Data Mart
Scope
Enterprise
Department
Subjects
Multiple
Single-subject
Data Source
Many
Few
Size (typical)
100 GB to > 1 TB
< 100 GB
Implementation time
Months to years
Months
End-user Access Tools
• High performance is achieved by pre-planning
the requirements for joins, summations, and
periodic reports by end-users.
• There are five main groups of access tools:
– Data reporting and query tools
– Application development tools
– Executive information system (EIS) tools
– Online analytical processing (OLAP) tools
– Data mining tools
Data Usage - $1000 questions
Verification
What is the average sale for
in-store and catalog
customers?
What is the average high
school GPA of students who
graduate from college
compared to those who do
not?
Discovery
What is the best predictor
of sales?
What are the best
predictors of college
graduation?
Need to complement RDBMS technology with a flexible,
multidimensional view of data
The Functionality of OLAP
• Rotate and drill down
• Create and examine calculated data
• Determine comparative or relative
differences.
• Perform exception and trend analysis.
• Perform advanced analytical functions
The star structure
Multidimensional Database
Customer
Store
Model
Store
Time
SALES
Time
FINANCE
Product
The data is found at the intersection of
dimensions.
Data Mining
Data mining functions
• Associations
– 85 percent of customers who buy a certain brand of wine also buy a
certain type of pasta
• Sequential patterns
– 32 percent of female customers who order a red jacket within six
months buy a gray skirt
• Classifying
– Frequent customers are those with incomes about $50,000 and
having two or more children
• Clustering
– Market segmentation
• Predicting
– predict the revenue value of a new customer based on that personal
demographic variables