Transcript Document

Extraction, Transformation,
and Loading (ETL)
Loading
Copyright © 2005, Oracle. All rights reserved.
Objectives
After completing this lesson, you should be able to
implement the following methods that are available for
loading data:
• SQL*Loader
• External tables
• OCI and direct-path APIs
• Data Pump
• Export/import
5-2
Copyright © 2005, Oracle. All rights reserved.
Data-Loading Mechanisms
You can use the following mechanisms for loading a
data warehouse:
• SQL*Loader
• External tables
• OCI and direct-path APIs
• Export/import
• Data Pump
5-3
Copyright © 2005, Oracle. All rights reserved.
Loading Mechanisms
•
•
•
•
5-4
SQL*Loader loads a formatted flat file into an
existing table.
It can perform basic transformations while
loading.
Direct-path loading may be used to decrease the
load time.
When you use this method, data in the flat file is
not accessible until the data is loaded.
Copyright © 2005, Oracle. All rights reserved.
SQL*Loader: Example
•
Control file used for loading the SALES table:
LOAD DATA INFILE sh_sales.dat APPEND INTO
TABLE sales
FIELDS TERMINATED BY "|"
(PROD_ID, CUST_ID, TIME_ID, CHANNEL_ID,
PROMO_ID, QUANTITY_SOLD, AMOUNT_SOLD)
•
The fact table can be loaded with the following
command:
$ sqlldr sh/sh control=sh_sales.ctl \
direct=true
5-5
Copyright © 2005, Oracle. All rights reserved.
Loading Mechanisms
•
•
•
•
•
5-6
External tables are read-only tables where the data
is stored outside the database in flat files.
The data can be queried like a virtual table, using
any supported language inside the database.
No DML is allowed and no indexes can
be created.
The metadata for an external
table is created using a
CREATE TABLE statement.
An external table describes how the external data
should be presented to the database.
Copyright © 2005, Oracle. All rights reserved.
Applications of External Tables
External tables:
• Allow external data to be queried and joined
directly and in parallel without requiring it to be
loaded into the database
• Eliminate the need for staging the data within the
database for ETL in data warehousing applications
• Are useful in environments where an external
source has to be joined with database objects and
then transformed
• Are useful when the external data is large and not
queried frequently
• Complement SQL*Loader functionalities:
– Transparent parallelism
– Full SQL capabilities for direct-path insertion
5-7
Copyright © 2005, Oracle. All rights reserved.
Example of Defining External Tables
CREATE TABLE sales_delta_xt ( prod_id NUMBER(6),
cust_id NUMBER, time_id DATE,unit_cost, unit_price ...)
ORGANIZATION external (
-- External Table
TYPE oracle_loader
–- Access Driver
DEFAULT DIRECTORY data_dir –- Files Directory
ACCESS PARAMETERS
–- Similar to SQL*Loader
( RECORDS DELIMITED BY NEWLINE
CHARACTERSET US7ASCII
BADFILE log_dir:'sh_sales_%p.bad'
LOGFILE log_dir:'sh_sales_%p.log_xt'
FIELDS TERMINATED BY "|" LDRTRIM )
location ( 'sales_delta.dat',
data_dir2:'sales_delta2.dat' ))
PARALLEL 5 –- Independent from the number of files
REJECT LIMIT UNLIMITED;
5-9
Copyright © 2005, Oracle. All rights reserved.
Populating External Tables
with Data Pump
CREATE TABLE emp_ext
(first_name, last_name, department_name)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_DATAPUMP
DEFAULT DIRECTORY ext_dir
LOCATION ('emp1.exp',)
)
AS
SELECT e.first_name,e.last_name,d.department_name
FROM
employees e, departments d
WHERE e.department_id = d.department_id AND
d.department_name in
('Marketing', 'Purchasing');
5-11
Copyright © 2005, Oracle. All rights reserved.
Privileges for External Tables
Access to the external tables for other users requires:
• SELECT on the table definition
• READ access to the directory containing the file
• WRITE access to the directory for the bad file and
log file
GRANT SELECT ON sh.sales_delta_xt TO oe;
GRANT READ ON DIRECTORY data_dir TO oe;
GRANT WRITE ON DIRECTORY log_dir TO oe;
5-12
Copyright © 2005, Oracle. All rights reserved.
Defining External Tables
Using SQL*Loader
After creating a control file, SQL*Loader can generate
a log file with the SQL commands to:
• Create the metadata for the external table
• Insert the data into the target table
• Drop the metadata for the external table
sqlldr sh/sh control=sales_dec00.ctl
EXTERNAL_TABLE=GENERATE_ONLY LOG=sales_dec00.sql
5-13
Copyright © 2005, Oracle. All rights reserved.
Data Dictionary Information
for External Tables
DBA_EXTERNAL_LOCATIONS
• OWNER
• TABLE_NAME
• LOCATION
• DIRECTORY_OWNER
• DIRECTORY_NAME
DBA_DIRECTORIES
• OWNER
• DIRECTORY_NAME
• DIRECTORY_PATH
5-14
DBA_EXTERNAL_TABLES
• OWNER
• NAME
• TYPE_OWNER
• TYPE_NAME
• DEFAULT_DIRECTORY_
OWNER
• DEFAULT_DIRECTORY_
NAME
• REJECT_LIMIT
Copyright © 2005, Oracle. All rights reserved.
Changing External Data Properties
•
Using the ALTER TABLE command, you can
change:
–
–
–
–
DEFAULT DIRECTORY
ACCESS PARAMETERS
LOCATION
REJECT_LIMIT
– Degree of parallelism
•
Useful in situations where external files are
changing:
ALTER TABLE sales_delta_xt LOCATION ('newfile1.dat')
5-16
Copyright © 2005, Oracle. All rights reserved.
Other Loading Methods
•
OCI and direct-path APIs:
– Allow transformation and loading at the same time
– Access an online source
– Do not require an intermediary step such as a flat
file
•
Export/import:
– Is good for small loads
– Allows for easy transfers between Oracle databases
on different operating systems
5-18
Copyright © 2005, Oracle. All rights reserved.
Summary
In this lesson, you should have learned how to
implement the following methods that are available for
loading data:
• SQL*Loader
• External tables
• OCI and direct-path APIs
• Data Pump
• Export/import
5-19
Copyright © 2005, Oracle. All rights reserved.
Practice 5: Overview
This practice covers the following topics:
• Loading data from a flat file using SQL*Loader
• Loading data from a flat file using external tables
5-20
Copyright © 2005, Oracle. All rights reserved.