Transportation: Loading Warehouse Data

Download Report

Transcript Transportation: Loading Warehouse Data

Transportation:
Loading Warehouse Data
Chapter 12
Transporting data into the
Warehouse





Loading moves the data into the warehouse
Loading can be time-consuming:
- Consider the load window.
- Schedule the task; automates all
processes.
Initial load moves large volumes
Subsequent refresh moves smaller volumes
Business determines the cycle
Extract Processing
Environment
Operational
database
T1


T2
T3
After each time interval, build a new
database
Run queries
Warehouse Processing
Environment
Operational
database
T1




T2
T3
Build a new database
After each time interval, add changes to
database
Archive or pure oldest data
Run queries
First-Time Load




Single event that populates the data with
historical data
Involves large volume of data
Employs distinct ETT tasks
Involves large amounts of processing after
load
Operational
database
T1
T2
T3





Refresh
Performed according to a business cycle
Simple task
Less data to load than first-time load
Less-complex ETT
Smaller amounts of postload processing
Operational
database
T1
T2
T3
Building the Transportation
Process
Specification
 Techniques and tools
 File transfer methods
 The load window
 Time window for other tasks
 First-time and refresh cycle
 Connectivity bandwidth
Building the Transportation
Process






Test the proposed techniques
Document proposed load
Gain agreement on the process
Monitor
Review
Revise
Granularity


Important design
and operational
issue
Space requirements
- Storage
- Backup
- Recovery
- Load
Low-level grain
- Expensive, high
level of processing,
more disk, detail
High-level grain
- Cheaper, less
processing, less disk,
little detail
Transportation Techniques







Tools
Utilities and 3GL
Gateways
Customized copy programs
Replication
FTP
Manual
Transportation Technique
Considerations



Tools are comprehensive but costly.
Data-movement utilities are fast and
powerful.
Gateways are not always the fastest method:
- Access other databases
- Supply dependent data marts
- Support a distributed environment
- Provide real-time access if needed
Using SQL*Loader to Load
Data
Input files
Control file
Fastest load mechanism
Bad files
Log
files
SQL*Loader
 Direct path
 Parallel and unrecoverable Discard files
 Direct-load INSERT (Oracle8)
 Direct-path load API (Oracle8i)

Direct-Path Load API in
Oracle8i




Allows ETT and other tools to load
Oracle databases efficiently
Permits load behavior to be customized
Gives direct-path load performance
Provided complete access to all directload functionality using OCI
More Transportation Technique
Considerations


Use customized programs as a last
resort
Replication is limited by data-transfer
rates
Postprocessing of Loaded Data
Transform
Extract
Create
Indexes
Transport
Generate
keys
Postprocessing
of loaded data
Summarize
Filter
Indexing Data



Before load: fast index reenablement
During load: adds time to load window
After load: adds time to load window
Index
Operational
databases
Staging
files
Warehouse
database
Unique Indexes


Disable constraints load
Enable constraints to create index
Disable Load Enable
constraints data constraints
Create Catch Reprocess
index errors
Creating Artificial Keys




Use generalized or derived keys
Maintain the uniqueness of a row
Use an administrative process to assign the
key
Concatenate operational key with number:
- Easy to maintain
- Cumbersome keys
- No clean value for retrieval
Creating Unique Keys for
Records

Assign a number from a list:
- No semantic meaning
- Extract operations must reference table
assign numbers
109908
Update metadata
 Verdict
1
Creating Summary Tables


CTAS
pCTAS
Summary data
Warehouse
Data marts
Verifying Data Integrity


Load
File
1
File
2
Load data into intermediate file
Compare target flash totals with totals
before load
Counts
and
amounts
Flash
totals
Intermediate Warehouse
file
File
1
Load
File
2
Preserve,
inspect,
fix, then load
Steps for Verifying Data
Integrity
Standard Quality Assurance
Checks







Load status
Completion of the process
Completeness of the data
Data reconciliation
Violations
Reprocessing
Comparison of counts and amounts
Summary
This lesson discussed the following topics:
 First-time load considerations
 Techniques for transporting data
 Tasks involved in the postload
processing stage