Transcript Lecture 1

Data Warehousing
The Gap
• “Drowning in data but starving for
information”
• Fragmented information systems
– “one-thing-at-a-time” approach
– Islands of information
– Uncoordinated and inconsistent
databases using different hw and sw
platforms
The Gap
• Most systems support operational
processing
– No focus on analytical tools for decision
making
– Operational Processing: captures, stores,
manipulates data to support daily
operations of the organization
– Informational processing: analysis of data
Information
• Kept in operational systems
– Turn wheels of organization
– 1 record at a time
– Same tasks
• Data warehouse
– Watch wheels of organization turn
– Thousands of records compressed into an answer
set
– Usage of DW changes kind of questions asked
Goals
• Make information easily accessible
• Present information consistently
• Adaptive and resilient to change
• Secure and protect information
• Foundation for improved decision
making
• Accepted by business users
Components
Data
Staging
Area
Operational
Source
Systems
Data
Presentation
Data Mart 1
EXTRACT
TRANSFORM
LOAD
Data Mart 2
Data
Access
Tools
Components
• Operational source systems
– Record/ capture day to day transactions
– Queries are restricted
– Little historical data
• Data staging area (ETL)
– Extract: copy source data
– Transform: cleansing, combining, etc…
– Load: transfer data into target data mart and
index
– May use relational db or flat files for storage
– Mostly sorting and sequential processing
Components
• Data presentation
– Data is organized, stored, and made
available for direct querying by users,
report writers, and other analytical
applications
– Data Marts using the dimensional model
– Detailed atomic data
Components
• Data access tools
– Capabilities that are provided to business users to
use the presentation layer for analytical decision
making
– Tools for querying
– May be an ad-hoc query tool or data mining
software
• Metadata
– Data about the data
– Used in guiding transformation
Components
• Data Marts
– Logical and physical subset of a data
warehouse’s presentation area
– Flexible set of data based on the most
atomic (granular) data possible to extract
from an operational source and
presented in a dimensional model that is
most resilient when faced with
unexpected queries
– Data from single business process
Components
Data
Staging
Area
Operational
Source
Systems
Data
Presentation
Data Mart 1
EXTRACT
TRANSFORM
LOAD
Data Mart 2
Data
Access
Tools
Data Warehouse
• Kimball Definition
– Conglomeration of an organization’s data
warehouse staging and presentation
areas where operational data is
specifically structured for query and
analysis performance and ease-of-use
Data Warehouse
The Data Warehouse According to Kimbal
Data
Staging
Area
Operational
Source
Systems
Data
Presentation
Data Mart 1
EXTRACT
TRANSFORM
LOAD
Data Mart 2
Data
Access
Tools
Data Warehouse
What business users perceive
as the data warehouse
Data
Staging
Area
Operational
Source
Systems
Data
Presentation
Data Mart 1
EXTRACT
TRANSFORM
LOAD
Data Mart 2
Data
Access
Tools
Pitfalls
• Tech over business
• Fail to see importance of business
sponsor
• Multi-year project vs. iterative
development
• Allocate too much energy to construct
normalized data structure and run out
of budget before building presentation
area
Pitfalls
• Backroom operational performance
and ease of development over frontroom query performance and ease of
use
• Assume static business requirements
• DW success = user acceptance