Copying, Managing, and Transforming Data With DTS

Download Report

Transcript Copying, Managing, and Transforming Data With DTS

Copying, Managing, and
Transforming Data
With DTS
Defining Bulk Insert Task Functionality

Quickly Loads Data from a File into SQL Server

Encapsulates the Transact-SQL Bulk Insert Statement

Supports Table or View Destinations in SQL Server

Loads Data with No Applied Transformations

Supports Format Files to Specify File Layout

Requires Sysadmin or Bulkadmin Fixed Server Roles
Membership
The Bulk Insert Task is One of Three Ways to Run SQL
Server Bulk Copy Operations
Sidebar: SQL Server Bulk Copy Operations


What Do Bulk Copy Operations Offer?

Allow Fast Loading of Data into SQL Server

Configure Data Load Batches

Allow You to Control Logging Operations
Ways to Access Bulk Copy Operations
1) Bcp Utility
2) Bulk Insert Task or T-SQL Bulk Insert Statement
3) Bulk Copy APIs for OLE DB, ODBC, DB-Library
Applications
Defining the Sales_stage Table Load
DTS
Tab Delimited File

Using the Bulk Insert Task to Load
Tab-delimited File Data into
Sales_stage

Loading Sales_stage with Data Bound
for Sales_fact
Polaris
Defining Execute SQL Task Functionality


Executing SQL Statements

Source database must understand SQL syntax

SQL statement determines task performance

Task supports single or multiple SQL statements

You can create queries in the DTS Query Designer
Running Parameterized Queries

Input parameters

Output parameters
Using Parameterized Queries


Understanding Global Variable Basics

User-defined storage locations

Information is shared across package steps
Using Parameters with Global Variables


Assign global variable values to query input parameters
Store query results to a global variable with output
parameters
Creating Dynamic Queries
SELECT *
FROM product_dim
WHERE product_name = ?
AND category_name = ?
Global Variables
Parameter
ProductName
Parameter 1
CategoryName
Parameter 2
?
Question Marks Represent Query Parameters
Parameter 1 The Parameter’s Position in the Query Determines
Its Name
Global Variables Provide Data to Input Parameters
Storing Query Results
Storing Row Values
SELECT begin_date,
end_date
FROM
financial_period
WHERE quarter = 1
Output Parameter
Global Variables
begin_date
BeginDate
end_date
EndDate
Storing Entire Rowsets
SELECT *
FROM product
Output Parameter
Entire Rowset
Global Variable
Product
Store Query Results in Global Variables
Defining the Time_dim Data Load
Time_dim_build
Stored Procedure
Input Parameters
- @p_start_date
- @p_end_date
DTS
Defining the DTS Data Pump

DTS Mechanism for Moving and Transforming Data

Allows for High-speed Batch Copying of Data

Contains Supplied Data Transformations

Can Also Define ActiveX Script Transformations

Provides An Extendable COM-based Architecture That
Allows for Custom Transformations (C++)

Permits the Application of Transformation Logic to
Specific Phases of a Data Pump Operation

Multi Phase Data Pump
Understanding How the Data Pump Processes Data
Source
OLE DB
ODBC
1.
Connects to the source
and destination
2.
Reads OLE DB metadata
about source and
destination columns
3.
Gathers data
transformation definitions
X Forms
Destination
ActiveX Script
Copy
Trim String
…
Custom
DTS Data Pump
In Out
OLE DB
ODBC
4.
Implements the
transformation
5.
Writes completed record
to the destination
Defining the Tasks That Transform Data

The Transform Data Task



Inserts
The Data Driven Query Task

Inserts

Updates

Deletes
The ParallelDataPumpTask

Processes hierarchical rowsets
Defining the Transform Data Task


Data Movement and Transformation Functionality

Copying data between heterogeneous data sources

Applying optional column level transformations
Extended Data Transfer Functionality

Supporting batch processing of data

Providing error-handling capabilities

Containing optimization settings for SQL Server
destinations
Selecting Transformation Types
Transformation
Description
ActiveX Script
Invokes user-defined ActiveX scripts.
Copy Column
Copies data from source to destination.
DateTime String
Converts a date to a new destination format.
Lowercase String
Converts a string to lowercase characters.
Uppercase String
Converts a string to uppercase characters.
Middle of String
Extracts a sub string of source data.
Trim String
Removes white space from a source string.
Read File
Copies contents of a file to a destination column.
File path is specified by a source column.
Write File
Copies contents of a source column to a file.
File path is specified by a second source column.
Defining Column Mappings

One-to-One Mappings

Symmetric Many-toMany Mappings

Asymmetric Mappings
Creating Efficient Column Mappings

Minimizing the Number of Column Mappings

Using Many-to-Many Mappings When Possible

Grouping Common Transformations Together
Loading Customer_dim
Northwind OLTP
SQL Server Database
Performance Settings

Enabling Fast Load

Using high-speed bulk copy processing

Accepting batches of transformed data

Only applies to SQL Server destinations

Using a Table Lock

Configuring Batch Size
Configuring Batch Size



Assembling Records into Groups

DTS commits records to database as a group

Insert batch size sets the number of records in the group
Understanding Default Behavior

Insert batch size is 0

DTS assigns one batch for all records
Setting the Insert Batch Size

Value between 0-9999

Setting value can improve performance
Defining SQL Solutions

You Can Use the Source Query of the Transform Data
Task to Implement Data Transformations

The Source SQL Statement Must Be Understood by the
Source Database

The Performance of the Source Query Depends on the
SQL Statement

You Can Use Parameters in the Source Query to Create
Dynamic Source SQL Statements

If You Use the Source Query to Manipulate Data, You
Can Use the Copy Column Transformation to Load Data
into the Destination
Applying SQL Solutions to Load Fact Tables

Using the Source Query to Join Staging Table Data to
Dimension Tables

Retrieving Primary Key Values to Store as Foreign Keys
on the Fact Table

Using a Copy Column Transformation in the Transform
Data Task

Configuring Fast Load for SQL Server Destinations
Loading the Fact Table
Dimension
Tables
Source Data
customer id
ALFI
ALFI
customer_dim
201 ALFI Alfreds
product id order date quantity_sales amount_sales
123
1/1/2000
400
10,789
Sales Fact Data
product_dim
cust_key
25 123 Chai
time_dim
134 1/1/2000
201
prod_key time_key
123
25
1/1/2000
134
quantity_sales amount_sales
400
10,789

Identifying Dimension Application Key Values in the
Fact Table Source Data

Retrieving Primary Keys from Each Dimension Table to
Assign Foreign Keys
Loading Sales_fact
DTS

Extracting Data from the Sales_stage Table

Assigning Foreign Keys by Retrieving Primary Keys
from the Product_dim, Customer_dim, and Time_dim
Dimensions
Best Practices - Performing Inserts


Bulk Insert Task

Accessing data in files

Loading data into SQL Server destinations

Copying data with no transformations
Transform Data Task

Accessing any source

Loading to any destination

Creating data transformations

Using input parameters in the source query

Applying custom logic to phases of the data pump
Best Practices - Performance Settings


Tuning the Transform Data Task

Fast load for SQL Server destinations

Batch size

Table lock
Tuning the Bulk Insert Task

Sort order for clustered indexes

Batch size

Table lock
Best Practices - Executing Flexible Queries


The Data Driven Query Task

Execute flexible queries on a row-by-row basis

Meet flexibility needs that outweigh performance needs

Perform non-insert queries
The Execute SQL Task

Execute SQL statements and extended SQL statements

Perform parameterized queries

Assign query outputs to global variables
Best Practices - Using Custom Tasks

Creating Reusable Functions and Utilities

Adding Functionality to DTS Package Designer

Implementing a Faster Alternative to ActiveX Script
Tasks
Best Practices - Creating Efficient Column Mappings

Minimizing the Number of Column Mappings

Using Many-to-Many Mappings When Possible

Grouping Common Transformations Together
Best Practices - The Right Transformation Type

Using Supplied Transformations When Possible

Minimizing ActiveX Script Transformations When
Performance Outweighs Flexibility

Using SQL Solutions with Copy Column
Transformations

Developing Custom Transformations as a Faster
Alternative to ActiveX Script Transformations