warehouse_chapter13
Download
Report
Transcript warehouse_chapter13
Transportation:
Refreshing Warehouse Data
Chapter 13
Developing a Refresh Strategy for Capturing Changed Data
Consider load window
Identify data volumes
Identify cycle
Know the technical infrastructure
Plan a staging area
Determine how to detect changes
Operational
databases
User Requirements and
Assistance
Users define the refresh cycle
IT balances requirements against technical
issues
Document all tasks and processes
Employ user skills
Operational
databases
Load Window
Time available for entire ETT process
Plan
Test
Prove
Monitor
Load Window
0
3am
6
User Access Period Load Window
9
12pm
3
6
9
12
Load Window
Plan and build processes according to a strategy.
Consider volumes of data.
Identify technical infrastructure.
Ensure currency of data.
Consider user access requirements first
High availability requirements may mean a small load
window
User Access Period
0
3am
6
9
12pm
3
6
9
12
Scheduling the Load Window
Requirements
Load cycle
Receive
data
File
1
File FTP
2
0
3
Control File
File Names
File types
Number of files
Number of loads
First-time load or refresh
Date of file
Data range
Records in file - counts
Totals - amounts
Control
4
process
Open
and
read
files
to
verify
and
analyze
3 am
Scheduling the Load Window
6
5
Verify,
analyze,
reapply
Load into
warehouse
File
1
File
2
3 am
8
7
Index
data
Create
summaries 9
Update
metadata
Parallel
load
6 am
9 am
Scheduling the Load Window
11
10
Back up
warehouse
Create
Views for
Specialized
tools
12
13
Users
Access Publish
Summary
data
User access
6 am
9 am
Capturing Changed Data for
Refresh
Capture new fact data
Capture changed dimension data
Determine method for capture of each
Methods:
- Wholesale data replacement
- Comparison of database instances
- Time stamping
- Database triggers
- Database log
Hybird techniques
Wholesale Data Replacement
Operational
databases
T1
T2
Expensive
Limited historical data, if any
Data mart implementations
Time period replacement
T3
Comparison of Database
Instance
Yesterday’s
Operational
database
Today’s
Operational
database
Database
comparison
Delta file holds
Changed data
Simple to perform, but expensive in
time and processing
Data file:
- Changes to operational data since
last refresh
- Used by various techniques
Time and Date Stamping
Operational
data
Delta file holds
Changed data
Fast scanning for records changed since
last extraction
Date Updated field
No detection of deleted data
Database Triggers
Operation
Server
(DBMS)
Trigger
Trigger
Trigger
Changed data intersected at the server
level
Extra I/O required
Maintenance overhead
Using a Database Log
Operational
data
Operational
Server
(DBMS)
Log
Log analysis
And
Data extraction
Contains before and after images
Requires system checkpoint
Common technique
Delta file holds
Changed data
Verdict
Consider each method on merit.
Consider a hybrid approach if one
approach is not suitable.
Consider current technical, existing
operational, and current application
issues.
Applying the Changes to Data
You have a choice of techniques:
Overwrite a record
Add a record
Add a field
Maintain history
Add version numbers
Overwriting a Record
Customer ID
John Doe
Single
Customer ID
John Doe
Married
Easy to implement
Loses all history
Not recommended
Adding a New Record
1 Customer Id John Doe Single
1 Customer Id John Doe Single
1A Customer Id John Doe Married
History is preserved; dimensions grow.
Time constraints are not required.
Generalized key is created.
Metadata tracks usage of keys.
Adding a Current Field
Customer Id
John Doe
Single
Customer Id
John Doe
Single
Married 01-JAN-96
Maintains some history
Loses intermediate values
Is enhanced by adding an Effective
Date field
Limitations of Methods for
Applying Changes
Complete history impossible
Dimensions may grow large
Maintenance overload
1234 Comer
1234 Comer
1 Main Street
200 First Ave
555-6789
222-3211
1234
Comer
1234-01 Comer
1 Main Street
200 First Ave
555-6789
222-3211
1234
Comer
1234-01 Comer
1 Main Street
200 First Ave
Effective Date
555-6789
01-Apr-93
222-3212
01-Jun-97
Maintaining History
HIST_CUST
Time
CUSTOMER
Sales
Product
One-to-many relationship
Always retain current record
Consistently able to refer to record
history
History Preserved
History enables realistic analysis.
History retains context of data.
History provides for realistic historical
analysis.
- Reflect business changes
- Maintain context between fact and
dimension data
- Retain sufficient data to relate old to new
Version Numbering
Avoid double counting
Facts hold version number
Customer
Time
Sales
Customer.CustId Version Customer Names
1234
1
Comer
1234
2
Comer
Customer.CustId Version Sales Facts
1234
1
11,000
1234
2
12,000
Product
Purging and Archiving Data
As data ages, its value depreciates.
Remove old data from the warehouse:
- Archive for later use
- Purge without copy
Techniques for Purging Data
TRUNCATE: Retains no rollback
DELETE: Retains redo and rollback
ALTER TABLE: Removes a partition
PL/SQL: Uses database triggers
Techniques for Archiving Data
Export to dump file from tables
Import to tables from dump file
ALTER TABLE EXCHANGE partitions
EXP
Database
IMP
.dmp
Verdict
Defined by business requirements
Must be managed
Final Tasks
Update metadata
- ETT
- User
Publish data
- Availability
- Changes
- Subject area basis
Use database roles to prevent and allow
access
Publishing Data
Control access using database roles
24-hour operation may be requested
Compromise between load and access
Consider
- Staggering updates
- Using temporary tables
- Using separate tables
ETT Tool Selection Criteria
Overlap with existing tools
Availability of meta model
Supported data sources
Ease of modification and maintenance
Required fine tuning of code
Ease of change control
Power of transformation logic
Level of modularization
Power of error, exception, resubmission features
Intuitive documentation
Performance of code
ETT Tool Selection Criteria
Activity scheduling and sophistication
Metadata generation
Learning curve
Flexibility
Supported operation systems
Cost
Transportation Tools
Information
Oracle
Platinum Technology
OpenBridge
SQL*Loader
Gateways
PL/SQL
Precompilers
InfoPump
Platinum Info Transport
Replication Server Utilities
Oracle
Symmetric and Heterogeneous
Replication
Gateways and Middleware
Brio Technology
Information Co.
Information Builders
Oracle
Platinum Technology
Prism
Software AG
DataPrism
OpenBridge
EDA/SQL
Gateways
InfoHub
Prism Manager
Entire Transaction
Propagator
Summary
This lesson discussed the following topics:
Capturing changed data
Applying the changes
Purging and archiving data
Publishing the data, controlling access,
and automating processes
Identifying tools for transporting data
into the warehouse