Data Warehousing
Download
Report
Transcript Data Warehousing
Data Warehousing
Tyler Helmle
Introduction
Definition
History
Processes and Terms
Management Techniques
Definition
Collection of Data
Subject Oriented
Integrated
Time-Variant
Non-Volatile
To Support Managements Decision Making
Process
Subject Oriented
Particular Subject
Instead of Companies Ongoing Operations
Integrated
Gathered from variety of Sources
Merged into a Whole
Time-Variant
Specific Time Period
Monthly
Weekly
Daily
Etc.
Non-Volatile
Stable
Never Remove Data
Consistent Picture of Business
Definition
Bill Inmon
Single Subject
1990
Data Mart
Possible to be Volatile
Size Contraints
Rolling Lengths
Definition
Ralph Kimball
“a copy of transaction data specifically structured
for query and analysis.”
Less insight
Less depth
Not Less Accurate
History
Concept Originated (Late 1980’s)
IBM Researchers
Barry Delvin and Paul Murphy
“Business Data Warehouse”
Operational Systems to Decision Support
History
Trends
Corporations had multiple Decision Support
Systems
Expensive
Redundant
Get to one efficient warehouse
Data Marts
History
Bill Inmon
“Father of the Data Warehouse”
Building of the Data Warehouse
1991
Taught
Data Warehouse
Design
Designing and Rolling Out Warehouse
Surajit Chaudhuri and Umeshwar Dayal
Design
Define the architecture, do capacity planning,
and select the storage servers, database and
OLAP servers, and tools
Integrate the servers, storage, and client
tools
Design the warehouse schema and views
Define the physical warehouse organization,
data placement, partitioning, and access
methods
Design (Cont.)
Connect the sources using gateways, ODBC
drivers, or other wrappers
Design and implement scripts for data
extraction, cleaning, transformation, load, and
refresh
Populate the repository with the schema and
view definitions, scripts, and other metadata
Design (Cont.)
Design and implement end-user applications
Roll out the warehouse and applications
Design (Cont.)
Database Designs
Multidimensional Approach
Normalized Approach
Data Warehouse
Raw Data
Business will Decide
Department?
Market Data?
Customer Data?
Data Granularity
Data Warehouse
Data Extracting
Extract from Source Systems
Database
Flat Files
Converts into format for transformation
Data Transforming
Data Validation
Difficult
Common Problems Occur
Data Transforming
Selection of Columns
Translation of Data
Make all data consistent
Derive New Calculated Value
Filtering
Sorting
Joining of Data
Data Transforming
Transposing
Splitting Columns
Data Transforming Tools
Data Migration
Data Scrubbing
Transformation of Data
Domain Specific Knowledge
Fixing or Eliminating Data
Data Auditing
Patterns
Inconsistencies
Data Load
Batch Processing
Visibility and Management
Administrator
Start, Cancel, Suspend, Resume Load
Large
Quick and Efficient
Data Load
Parallel Processing
Splitting Data Files to provide Parallel Access
Pipeline
Component
Simultaneous running of several components
Simultaneous running of multiple processes
All three types usually operate at same time
Data Refresh
When?
How?
Data Warehouse
Data Mart
Subset of Data
Major Data Subject
Increase Performance
Separate Security
Prove ROI before adding to Data Warehouse
Data Mart
Ease of Creation
Lower Cost than Data Warehouse
Data Mart
Cons
Limited Scalability
Duplication of Data
Data Inconsistency
Low ability to leverage enterprise sources of data
Data Warehouse
Metadata
“Data about Data”
Clarifies Data
Explains Data
Helps Navigate Through Data
Facilitates (Both Human and Computers)
Understanding
Usage
Management
Metadata
Different from Data
Sometimes can be both
Point of view
Types
Descriptive
Administrative
Structural
Technical
Use
Metadata
Content
Mutability
Describe Resource
Describe Content
Immutable
Mutable
Logical Function
Sub-symbolic
Symbolic
Logical
Metadata
Back End
Extract, Transform, Load
Front End
Label Screens
Create Reports
Data Warehouse
End Product
Analysis
Querying
Reporting
Data Mining
Data Mining
Requires Large Amount of Data
Determine Patterns in Data
Determine Relationships in Data
Analyze Results and Conclude
Standard Reporting
Weekly/Monthly/Yearly
Same Constant Data
Recent Information
Arrives at Expected Time on Expected
Medium
Configurable Reports
Select Parameters to Show on Report
Ex.
Time Period
Region
Support of Data Remains the Same
Ad Hoc Reporting
End Users Pick and Choose Data
Requires
Training
Knowledge of Data
Do Not Want Users Miss Using the Data
Dashboards / Scorecards
Like Standard Reports
Summarized
Easy to Read
Graphical
Data Warehouse
Data Warehouse Management
Data Warehouse Administrator
Visibility
Knowledge of what is happening
Loading, Extracting, Etc.
Data Warehouse Management
Responsibilities
What is data being used for?
Who is using the data?
What is the response time?
What kinds of activities are being submitted?
Data Warehouse Management
Data Management (Data Monitor)
Counts of Data
Analyzes Profiles of Data
Determines Threshold of Data
Makes DWA’s job easier
Data Warehouse Management
Security
Confidential?
Access Restriction
Encryption
Decryption
Pros
One Common Data Source
Ease of Reporting and Analysis
Inconsistencies Identified Right Away
Retrieve Data Without Slowing Down
Operational Systems
Show Actual Performance
Cons
Data Is Not Always Most Current
High Costs
Installation
Maintenance
Support
Hard To Stay Ahead
Summary
Data Warehousing can prove to be a success
Common Knowledge of a database
technology
As technology and research continues, the
possibilities are endless
Questions?