Copying, Managing, and Transforming Data With DTS
Download
Report
Transcript Copying, Managing, and Transforming Data With DTS
Copying, Managing, and
Transforming Data
With DTS
Defining Bulk Insert Task Functionality
Quickly Loads Data from a File into SQL Server
Encapsulates the Transact-SQL Bulk Insert Statement
Supports Table or View Destinations in SQL Server
Loads Data with No Applied Transformations
Supports Format Files to Specify File Layout
Requires Sysadmin or Bulkadmin Fixed Server Roles
Membership
The Bulk Insert Task is One of Three Ways to Run SQL
Server Bulk Copy Operations
Sidebar: SQL Server Bulk Copy Operations
What Do Bulk Copy Operations Offer?
Allow Fast Loading of Data into SQL Server
Configure Data Load Batches
Allow You to Control Logging Operations
Ways to Access Bulk Copy Operations
1) Bcp Utility
2) Bulk Insert Task or T-SQL Bulk Insert Statement
3) Bulk Copy APIs for OLE DB, ODBC, DB-Library
Applications
Defining the Sales_stage Table Load
DTS
Tab Delimited File
Using the Bulk Insert Task to Load
Tab-delimited File Data into
Sales_stage
Loading Sales_stage with Data Bound
for Sales_fact
Polaris
Defining Execute SQL Task Functionality
Executing SQL Statements
Source database must understand SQL syntax
SQL statement determines task performance
Task supports single or multiple SQL statements
You can create queries in the DTS Query Designer
Running Parameterized Queries
Input parameters
Output parameters
Using Parameterized Queries
Understanding Global Variable Basics
User-defined storage locations
Information is shared across package steps
Using Parameters with Global Variables
Assign global variable values to query input parameters
Store query results to a global variable with output
parameters
Creating Dynamic Queries
SELECT *
FROM product_dim
WHERE product_name = ?
AND category_name = ?
Global Variables
Parameter
ProductName
Parameter 1
CategoryName
Parameter 2
?
Question Marks Represent Query Parameters
Parameter 1 The Parameter’s Position in the Query Determines
Its Name
Global Variables Provide Data to Input Parameters
Storing Query Results
Storing Row Values
SELECT begin_date,
end_date
FROM
financial_period
WHERE quarter = 1
Output Parameter
Global Variables
begin_date
BeginDate
end_date
EndDate
Storing Entire Rowsets
SELECT *
FROM product
Output Parameter
Entire Rowset
Global Variable
Product
Store Query Results in Global Variables
Defining the Time_dim Data Load
Time_dim_build
Stored Procedure
Input Parameters
- @p_start_date
- @p_end_date
DTS
Defining the DTS Data Pump
DTS Mechanism for Moving and Transforming Data
Allows for High-speed Batch Copying of Data
Contains Supplied Data Transformations
Can Also Define ActiveX Script Transformations
Provides An Extendable COM-based Architecture That
Allows for Custom Transformations (C++)
Permits the Application of Transformation Logic to
Specific Phases of a Data Pump Operation
Multi Phase Data Pump
Understanding How the Data Pump Processes Data
Source
OLE DB
ODBC
1.
Connects to the source
and destination
2.
Reads OLE DB metadata
about source and
destination columns
3.
Gathers data
transformation definitions
X Forms
Destination
ActiveX Script
Copy
Trim String
…
Custom
DTS Data Pump
In Out
OLE DB
ODBC
4.
Implements the
transformation
5.
Writes completed record
to the destination
Defining the Tasks That Transform Data
The Transform Data Task
Inserts
The Data Driven Query Task
Inserts
Updates
Deletes
The ParallelDataPumpTask
Processes hierarchical rowsets
Defining the Transform Data Task
Data Movement and Transformation Functionality
Copying data between heterogeneous data sources
Applying optional column level transformations
Extended Data Transfer Functionality
Supporting batch processing of data
Providing error-handling capabilities
Containing optimization settings for SQL Server
destinations
Selecting Transformation Types
Transformation
Description
ActiveX Script
Invokes user-defined ActiveX scripts.
Copy Column
Copies data from source to destination.
DateTime String
Converts a date to a new destination format.
Lowercase String
Converts a string to lowercase characters.
Uppercase String
Converts a string to uppercase characters.
Middle of String
Extracts a sub string of source data.
Trim String
Removes white space from a source string.
Read File
Copies contents of a file to a destination column.
File path is specified by a source column.
Write File
Copies contents of a source column to a file.
File path is specified by a second source column.
Defining Column Mappings
One-to-One Mappings
Symmetric Many-toMany Mappings
Asymmetric Mappings
Creating Efficient Column Mappings
Minimizing the Number of Column Mappings
Using Many-to-Many Mappings When Possible
Grouping Common Transformations Together
Loading Customer_dim
Northwind OLTP
SQL Server Database
Performance Settings
Enabling Fast Load
Using high-speed bulk copy processing
Accepting batches of transformed data
Only applies to SQL Server destinations
Using a Table Lock
Configuring Batch Size
Configuring Batch Size
Assembling Records into Groups
DTS commits records to database as a group
Insert batch size sets the number of records in the group
Understanding Default Behavior
Insert batch size is 0
DTS assigns one batch for all records
Setting the Insert Batch Size
Value between 0-9999
Setting value can improve performance
Defining SQL Solutions
You Can Use the Source Query of the Transform Data
Task to Implement Data Transformations
The Source SQL Statement Must Be Understood by the
Source Database
The Performance of the Source Query Depends on the
SQL Statement
You Can Use Parameters in the Source Query to Create
Dynamic Source SQL Statements
If You Use the Source Query to Manipulate Data, You
Can Use the Copy Column Transformation to Load Data
into the Destination
Applying SQL Solutions to Load Fact Tables
Using the Source Query to Join Staging Table Data to
Dimension Tables
Retrieving Primary Key Values to Store as Foreign Keys
on the Fact Table
Using a Copy Column Transformation in the Transform
Data Task
Configuring Fast Load for SQL Server Destinations
Loading the Fact Table
Dimension
Tables
Source Data
customer id
ALFI
ALFI
customer_dim
201 ALFI Alfreds
product id order date quantity_sales amount_sales
123
1/1/2000
400
10,789
Sales Fact Data
product_dim
cust_key
25 123 Chai
time_dim
134 1/1/2000
201
prod_key time_key
123
25
1/1/2000
134
quantity_sales amount_sales
400
10,789
Identifying Dimension Application Key Values in the
Fact Table Source Data
Retrieving Primary Keys from Each Dimension Table to
Assign Foreign Keys
Loading Sales_fact
DTS
Extracting Data from the Sales_stage Table
Assigning Foreign Keys by Retrieving Primary Keys
from the Product_dim, Customer_dim, and Time_dim
Dimensions
Best Practices - Performing Inserts
Bulk Insert Task
Accessing data in files
Loading data into SQL Server destinations
Copying data with no transformations
Transform Data Task
Accessing any source
Loading to any destination
Creating data transformations
Using input parameters in the source query
Applying custom logic to phases of the data pump
Best Practices - Performance Settings
Tuning the Transform Data Task
Fast load for SQL Server destinations
Batch size
Table lock
Tuning the Bulk Insert Task
Sort order for clustered indexes
Batch size
Table lock
Best Practices - Executing Flexible Queries
The Data Driven Query Task
Execute flexible queries on a row-by-row basis
Meet flexibility needs that outweigh performance needs
Perform non-insert queries
The Execute SQL Task
Execute SQL statements and extended SQL statements
Perform parameterized queries
Assign query outputs to global variables
Best Practices - Using Custom Tasks
Creating Reusable Functions and Utilities
Adding Functionality to DTS Package Designer
Implementing a Faster Alternative to ActiveX Script
Tasks
Best Practices - Creating Efficient Column Mappings
Minimizing the Number of Column Mappings
Using Many-to-Many Mappings When Possible
Grouping Common Transformations Together
Best Practices - The Right Transformation Type
Using Supplied Transformations When Possible
Minimizing ActiveX Script Transformations When
Performance Outweighs Flexibility
Using SQL Solutions with Copy Column
Transformations
Developing Custom Transformations as a Faster
Alternative to ActiveX Script Transformations