Transportation: Loading Warehouse Data
Download
Report
Transcript Transportation: Loading Warehouse Data
Transportation:
Loading Warehouse Data
Chapter 12
Transporting data into the
Warehouse
Loading moves the data into the warehouse
Loading can be time-consuming:
- Consider the load window.
- Schedule the task; automates all
processes.
Initial load moves large volumes
Subsequent refresh moves smaller volumes
Business determines the cycle
Extract Processing
Environment
Operational
database
T1
T2
T3
After each time interval, build a new
database
Run queries
Warehouse Processing
Environment
Operational
database
T1
T2
T3
Build a new database
After each time interval, add changes to
database
Archive or pure oldest data
Run queries
First-Time Load
Single event that populates the data with
historical data
Involves large volume of data
Employs distinct ETT tasks
Involves large amounts of processing after
load
Operational
database
T1
T2
T3
Refresh
Performed according to a business cycle
Simple task
Less data to load than first-time load
Less-complex ETT
Smaller amounts of postload processing
Operational
database
T1
T2
T3
Building the Transportation
Process
Specification
Techniques and tools
File transfer methods
The load window
Time window for other tasks
First-time and refresh cycle
Connectivity bandwidth
Building the Transportation
Process
Test the proposed techniques
Document proposed load
Gain agreement on the process
Monitor
Review
Revise
Granularity
Important design
and operational
issue
Space requirements
- Storage
- Backup
- Recovery
- Load
Low-level grain
- Expensive, high
level of processing,
more disk, detail
High-level grain
- Cheaper, less
processing, less disk,
little detail
Transportation Techniques
Tools
Utilities and 3GL
Gateways
Customized copy programs
Replication
FTP
Manual
Transportation Technique
Considerations
Tools are comprehensive but costly.
Data-movement utilities are fast and
powerful.
Gateways are not always the fastest method:
- Access other databases
- Supply dependent data marts
- Support a distributed environment
- Provide real-time access if needed
Using SQL*Loader to Load
Data
Input files
Control file
Fastest load mechanism
Bad files
Log
files
SQL*Loader
Direct path
Parallel and unrecoverable Discard files
Direct-load INSERT (Oracle8)
Direct-path load API (Oracle8i)
Direct-Path Load API in
Oracle8i
Allows ETT and other tools to load
Oracle databases efficiently
Permits load behavior to be customized
Gives direct-path load performance
Provided complete access to all directload functionality using OCI
More Transportation Technique
Considerations
Use customized programs as a last
resort
Replication is limited by data-transfer
rates
Postprocessing of Loaded Data
Transform
Extract
Create
Indexes
Transport
Generate
keys
Postprocessing
of loaded data
Summarize
Filter
Indexing Data
Before load: fast index reenablement
During load: adds time to load window
After load: adds time to load window
Index
Operational
databases
Staging
files
Warehouse
database
Unique Indexes
Disable constraints load
Enable constraints to create index
Disable Load Enable
constraints data constraints
Create Catch Reprocess
index errors
Creating Artificial Keys
Use generalized or derived keys
Maintain the uniqueness of a row
Use an administrative process to assign the
key
Concatenate operational key with number:
- Easy to maintain
- Cumbersome keys
- No clean value for retrieval
Creating Unique Keys for
Records
Assign a number from a list:
- No semantic meaning
- Extract operations must reference table
assign numbers
109908
Update metadata
Verdict
1
Creating Summary Tables
CTAS
pCTAS
Summary data
Warehouse
Data marts
Verifying Data Integrity
Load
File
1
File
2
Load data into intermediate file
Compare target flash totals with totals
before load
Counts
and
amounts
Flash
totals
Intermediate Warehouse
file
File
1
Load
File
2
Preserve,
inspect,
fix, then load
Steps for Verifying Data
Integrity
Standard Quality Assurance
Checks
Load status
Completion of the process
Completeness of the data
Data reconciliation
Violations
Reprocessing
Comparison of counts and amounts
Summary
This lesson discussed the following topics:
First-time load considerations
Techniques for transporting data
Tasks involved in the postload
processing stage