Transcript Dwclass

DATA
WAREHOUSE
CONCEPTS
A Definition
 A Data Warehouse:
• Is a repository for collecting, standardizing, and
summarizing snapshots of transactional data contained in
an organization’s operations or production systems
• provides a historical perspective of information
• is most often, but not exclusively, used for decision support
applications and business information queries
• can be more than one database
• Is not a new concept
Another Definition
 Decision Support:
•
•
•
•
is a set of tools to easily access data
is becoming a critical business tool
is usually graphically oriented
is empowering end users with tools to access vital business
information
• is moving lots of data down to the end user workstation
• is a rapidly expanding area because of data warehousing
efforts and projects
Why a warehouse?
 For analysis and decision support, end users
require access to data captured and stored in an
organization’s operational or production
systems
 This data is stored in multiple formats, on
multiple platforms, in multiple data structures,
with multiple names, and probably created
using different business rules
Why do we want a central data store
Lif e
LTD
V oluntary
Benf its
Non-Medical
Individual
Disability
Financial
Account
Management
Underwriting
Customer Service
Sales/Marketing
Financial Analysis
A Good Reason for a Central Data Store
Interesting Statistics
 85% of the Fortune 1000 companies have, are
implementing, or are looking at, data warehouses (Meta
Group)
 90% of all information processing organizations will be
pursuing a data warehouse strategy in the next three
years (Meta Group)
 The Decision Support industry will be a $1 Billion
industry by 1997 (IDC & Forrester)
Data Warehouse Evolution - Stage 0
Application
End User
Reports
No end user access to
production files
Production Files
Application
Production Files
“What we print” is
“what you get”
End User
Reports
Data Warehouse Evolution - Stage 1
Applic ation
Produc tion Files
End Us er
Reports
Snaps hot File
Applic ation
End Us er
Reports
Produc tion Files
Snaps hot Files
End users denied direct
access to production
files
Snapshots or copies of
production files are
made available instead
Solution: Provide end
users access to
production systems
No Integration Between Systems
a
System developed in 1979
c
b
Purchased company
system
Purchased Package
New Application development
Rebuilt Application
Data Warehouse Evolution - Stage 2
Document
Document
Document
Document
Desktop computer
Document
Document
Document
Document
Document
A
Desktop computer
Document
Document
Document
b
Mainframe
Document
Document
Document
Document
Document
Document
c
4GL
Desktop computer
Server or Midreange
Data Characteristics
Type
Production
Warehouse
 Data Use
Operational
Detailed
Real time,
Latest value
Relatively brief
Dynamic
Application wide
Capture/update
Coded
Mgt Reporting
Summary
Multiple
generations
Forever
Static
Enterprise wide
Read only
Decoded
 Level of detail
 Currency
 Longevity
 Stability
 Scope of definition
 Data Operations
 Data values
Transforming the Logical Model
Brand Group
Brand
Product
Shipment
SKU
Day
Size
Order Item
Week
Month
Type
customer
Class
Calendar Year
Sales Rep
time
Route
market
District
Region
Key Differences - Part 1
 Key differences between “data jails”
(operational database) & warehouses
• Subject orientation - operational systems are applicationsegmented (i.e. banks = auto loan, demand deposit
accounting or mortgages). Subject areas for banks would be
customer and each financial product
• Level of integration - warehouses resolve years of
application inconsistency in encoding/decoding, data name
rationalization, etc
• Update volatility - record at a time updates in operational
database vs bulk loads in data warehouse
• Time variance norms include: 30-90 days of transactions for
operational system, 1-10 years for data warehouses
Key Differences - Part 2
Characteristic
Operational
Warehouse
Transaction volume
Response time
Updating
Time Period
Scope
Activities
High
Very fast
High volume
Current Period
Internal
Focused, clerical
operational
Predictable,
periodic
Low to huge
Reasonable
Very Low
Past to Future
External
Exploratory,
analytical, managerial
Can be Unpredictable,
Ad hoc
Queries
Types of Warehouse Configurations
 Enterprise
 Division
 Functional
• Financial
• Personnel
• Engineering/Product
 Departmental
 Special Project
What’s Really Involved?
Data Warehouse
Components
DB/2
V SA M
Management Reporting
Sales/Marketing
Customer Relations
Reserve Analysis
Risk Analysis
Mainframe
Applications
IMS
DB2/2
PC
Applications
???
Extract Programs
Data Cleansers/Scrubbers
Translators/Transformers
Timing Tools
Data Loading
File Transfer
Reserves
Rates
Customers
Combined
Data
Warehouse
Policies
External
Sources
Claims
Premiums
DB/6000
Midrange
DB/400
Decision
Support Tools
Typical Users of a Data Warehouse
 Decision Support Analysts, Business Analysts
• Marketing, Actuaries, Financial, Sales, Executive
 Grocery Store attitudes
• Going to the store, not knowing what they want
• Close proximity says give me “everything”
 Explorers
• Don’t know what they want
• Search on a random basis, non-repetitively
• Frequently finds nothing, but when they do, there are huge
rewards
 Farmers
• Know what they want
• Non random searches, finds frequent “flakes of gold”
• Finds small amounts of data
Advanced Warehouse Topics
 Metadata repositories
• Information about the data in the warehouse
• Like a library card catalog
• Data about when the information was created, what files
accessed, how much data
• Data about changes in business rules, processes
• Context versus Content
• “What does it mean?”
 Data Mining
• Drilling down into databases with tools to find specific
anomolies
 Online Analysis Processing (OLAP)
• Really means summary data