Integration Services

Download Report

Transcript Integration Services

What's NEW in SQL 2005 Integration Services
Matthew Stephen
SQL Server Specialist
http://blogs.technet.com/mat_stephen
Agenda
•
•
•
•
•
Overview
Business Intelligence Development Studio
Workflow Features
Dataflow Features
Other Features
Integration Services : A Short History
• DTS - SQL Server 7.0
– “Visual BCP” – a useful utility
• DTS - SQL Server 2000
– Easy (but slow) workflow & transform engine
– Customizable
• SSIS – SQL Server 2005
–
–
–
–
A completely new codebase
Enterprise class ETL
Exceptional BI integration – and more
Rich APIs and extensibility
New Features and Components in SQL IS
SQL Destination
Derived Column Transform
File Extractor Transform
Row Count Transform
XML Source Adapter
Data Reader Source Adapter
Shrink Database Task
Time Gen Source Adapter
Property Expressions
Dataflow Task
Merge Join Transform
Merge Transform
Sort Transform
Audit (Lineage) Transform
Union All Transform
SQL CE Destination Adapter
DTExec
Execution Stack Window
HTTP Connection Manager
For Loop Container
For Each Loop Container
Evaluate As Expression
DTUtil
Aggregate Transform
Lookup Transform
Enhanced Error Reporting
Execute DTS 2000 Package Task
Web Services Task
New Migration Wizard
Conditional Split Transform
SQL Mobile Connection Manager
Flat File Connection Manager
Enhanced Import/Export Wizard
Upgrade Analyzer for SSIS
ActiveX Script Task
Pivot Transform
Un-pivot Transform
Recordset Destination Adapter
OLE DB Source Adapter
OLE DB Destination Adapter
Flat File Source
Variable Namespaces
Data Conversion Transform
Excel Source Adapter
Excel Destination Adapter
OLEDB Reader Adapter
Visualizers
Integration Services Server
Component Enumeration Caching
Debugging Support in the Designer
Data Reader Destination Adapter
Debugging Breakpoints
IS Projects in VS
Watch Windows
Log Viewer Windows
Deployment Wizard
Transfer Logins Task
MSMQ Task
F1 Integrated Help
Copy/Paste in the Designer
ADO Connection Manager
ADO.NET Connection Manager
File Connection Manager
Analysis Services Processing Task
Row sampling Transform
WMI Event Watcher Task
Percent sampling Transform
Execute Process Task
Text Log Provider
Multicast Transform
WMI Query Task
FTP Connection Manager
Raw Source Adapter
MSOLAP90 Connection Manager
Multi-file Connection Manager
Multi-Flat File Connection Manager
ODBC Connection Manager
OLEDB Connection Manager
Easy Dataflow Adapter Extensibility
SMO Server Connection Manager
Transfer Errors Task
ForEach Item Enumerator
Runtime Execution Engine
Data Source Views
DTExecUI
Intellisense Help
Designer Grouping
Workflow Designer
Dataflow Designer
Event Handler Designer
Output Window
Error Window
Right Click Online Error Access
Tools Options pages
For Each ADO.NET Enumerator
Import/Export Wizard
Auto-layout algorithms
Raw Destination Adapter
Runtime Execution Engine
Configuration Wizard
Integration Services Roles
Package Encryption Options
Notify Operator Task
Package Signing
SAP .NET Provider
Managed Task Base Class
ADO.NET Script Task
File Inserter Transform
MSMQ Connection Manager
ADO.NET Script Component
For Each From Variable Enumerator
Easy Transform Extensibility
Transfer SQL Server Objects Task
Excel Connection Manager
SQL Server Log Provider
SQL Profiler Log Provider
Easy ForEach Enumerator Extensibility
Windows Event Log Provider
ForEach Nodelist Enumerator
SQL Server Configurations
XML Configurations
Registry Configurations
Environment Variable Configurations
Sequence Container
Event Handlers
ForEach File Enumerator
Slowly Changing Dimension Transform
Control Flow Designer
Send Mail Task
SQL Task
File System Task
XML Task
Native Object Model
SMTP Connection Manager
Bulk Insert Task
Analysis Services Execute DDL Task
Back Up Database Task
Check Database Integrity Task
Execute T-SQL Statement Task
Maintenance Cleanup Task
Rebuild Index Task
Reorganize Index Task
Workflow Checkpoint Restartability
Real time Designer Execution Monitoring
User defined transaction scope
FTP Task
Easy Task Extensibility
Transfer Stored Procedures Task
Easy Connection Manager Extensibility
Update Statistics Task
Easy Log Provider Extensibility
Integrated Development Environment
WMI Connection Manager
History Cleanup Task
.NET Compatible
Flat File Destination
Transfer Database Task
Data Convert Transform
Managed Object Model
XML Persisted
System Variables
New Events OnPreExecute etc.
For Each ADO Enumerator
Read Only Variables
OLEDB Command Transform
XML Log Provider
Dataflow Execution Engine
Package Explorer
before Integration Services
Alerts & escalation
Call centre data: semi structured
Text Mining
Staging
Staging
Legacy data: binary files
Hand
coding
Cleansing
&
ETL
Application database
ETL
Data mining
ETL
Staging
ETL
Warehouse
Reports
Mobile
data
•Integration and warehousing require separate, staged, operations.
•Preparation of data requires different, often incompatible, tools.
•Reporting and escalation is a slow process, delaying smart responses.
•Heavy data volumes make this scenario increasingly unworkable.
With Integration Services
Alerts & escalation
Call centre:
semi-structured data
Text mining
components
Data mining
components
Custom
source
Merges
Standard
sources
Data cleansing
components
Mobile
data
Warehouse
Legacy data: binary files
Application database
SQL Server Integration Services
Reports
•Integration and warehousing are a seamless, manageable, operation.
•Sourced, prepare and load data in a single, auditable process.
•Reporting and escalation can be parallelized with the warehouse load.
•Scales to handle heavy and complex data requirements.
How SSIS Works
•Data sources can be diverse, including custom or scripted adapters
•Transformation components shape and modify data in many ways.
•Data is routed by rules or error conditions for cleansing and conforming.
•Flows can be as complex as your business rules, but highly concurrent.
•And finally data can be loaded in parallel to many varied destinations.
Architecture
Standard transforms
Custom transforms
Data
Destination
Adapters
Data Source
Adapters
Package
XML
Package
Loops &
Sequences
Tasks
XML
Package
Event
Handlers
Wizards
DTS Designer
Command Line
Agenda
•
•
•
•
•
Overview
Business Intelligence Development Studio
Workflow Features
Dataflow Features
Other Features
Business Intelligence Development
Studio
– Centralized Project Management
•
•
•
•
Analysis Services
Reporting Services
Integration Services
.Net and Visual Studio
– Visual Studio Interface
• Integrated package debugging
• Edit, debug, fix loop in one UI
Business Intelligence Development
Studio
• Project
– Packages, Cubes, Reports
– Data Source Views
– Deployment
• Solution
–
–
–
–
Multiple Projects
Additional files in Misc Files
Your designer settings
Source Control with Visual Source
Safe
Work Flow vs. Data Flow
• Work Flow
– Process-oriented
– Do something to an
object
– Success, fail, or
completion result
– Environmental Inputs
– Distinct beginning and
end
• Data Flow
– Data-oriented
– Do something to data
– Produce and/or consume
rows of data
– Rows as inputs and outputs
– Parallel in nature
– Starts with first row, ends
with last row, for each row
Business Intelligence Development
Studio
• Package Editor
• Design Surfaces
– Work Flow, Data Flow, Event Handlers
– Properties Pane
• Variables, Connections UI
– Toolbox
– Pop-up editor windows • Output, Watch, Log, Error Windows
Agenda
•
•
•
•
•
Overview
BI Workbench
Workflow Features
Dataflow Features
Other Features
Work Flow Features
•
•
•
•
•
•
•
•
•
Tasks
Precedence Constraints
Containers
Looping
Events
Configurations
Logging
Property Expressions
Deployment
Work Flow: The Nature of Tasks
•
•
•
•
•
•
•
•
•
•
•
•
File System Task
Smallest Unit of Work
File Transfer Protocol Task
Execute Process Task
Individually successful or
Send Mail Task
unsuccessful
Execute SQL Task
WMI Task
Disabled or enabled
MSMQ Task
Database Management
Isolated : Unaware of other
Tasks
tasks
• Execute Package Task
• VB.Net Script Task, and
more!
Work Flow: Precedence Constraints
• Order the execution of • Constraint is met based on
– Success/Failure/Completion of
tasks
previous task
• Establish task sequence
– Evaluation result of an expression
– Establish concurrency
– Establish order
– Establish dependency
– A combination of the two
• May be Logically OR’ed or
AND’ed
Work Flow: Containers
• Arbitrary grouping of Executables
• Containers also use Precedence Constraints
• Containers execute concurrently
• The package is a container
– Loops, Sequences, Taskhosts and Event Handlers
are containers, too
• Containers may contain other containers
• Containers provide scope
–
–
–
–
Variables
Transactions
Execution
Logging
• Manage complexity
– Break packages into units
– Can be collapsed
– Move parts of a package as
one
– Disable container and all
children
Work Flow: Looping
• For Loop - Expressions
•
•
– Utilize container advantages
– ForEach Loop
– For Loop
• For Each Loop –
– ForEach Enumerators
– Variable Mappings
Dim pkg
Dim stp
set pkg = DTSGlobalVariables.Parent
set stp = pkg.Steps("DTSStep_DTSMyTask")
‘Do stuff here
‘Now loop
stp.ExecutionStatus =
DTSStepExecStat_Waiting
No more loop hacks
Functionality baked-in
•
•
Better interaction with environment
No coding necessary
Work Flow: Events
• Eventing Expanded
– Event Handlers
• In package event response
• Has information about event in System Variables
• Contains workflow
– Custom Events
• Tasks may register custom events
• May create event handlers for custom events
• Automatically logged
– Event Driven Architecture
• Tasks may be built to wait for events
• Graceful shutdown with “System::CancelEvent”
• Enables event driven paradigm eg. WMI Event Task
Work Flow: Configurations
• Enable “no-touch” package editing
• Support multiple types
–
–
–
–
SQL Server
XML
Environment Variables
Registry
• Configuration Wizard
• Indirect Configurations – Configuring
Configurations
• Use Package Paths for describing
location of configured objects
Work Flow: Logging
•
Rich array of log providers
– SQL Server
– XML
– Text
– Windows event viewer
– SQL Profiler
•
•
•
Integrated logging throughout
package
Log to separate providers
Write your own
Work Flow: Property Expressions
• Modifying Properties with Expressions
– All tasks, logproviders, connection
managers and
ForEach Enumerators and some dataflow components
•
•
•
•
Subtle and Surgical
Innumerable applications
May reference variables
Actuated at key edge events
–
–
–
–
Load/Save
Execute/Validate
AcquireConnection
GetEnumerator
Agenda
•
•
•
•
•
Overview
Business Intelligence Development Studio
Workflow Features
Dataflow Features
Other Features
Data Flow
•
•
•
•
High performance data movement
Components and Paths
Adapters
Transforms
Data Flow Features
• Components, each with zero or more inputs
and/or zero or more outputs
• Paths connect outputs to inputs
• Data Flow Task
– No rules broken, but bent a few
– Dedicated UI
– Own object model
Data Flow: Components and Paths
• Components
– Data Flow
Adapters
– Transforms
• Paths
– The route between
components
– Includes metadata about
columns
Data Flow: Adapters
• Reference connection
managers to access
data
• Move rows into buffers
• Convert native types to
dataflow task types
• Describe their output
• Have no inputs
• OLEDB Source and
Destination
• Flat File Source and
Destination
• XML Source
• SQL Server Destination
• SQL CE Destination
• Recordset Destination
• Raw Source and
Destination
Data Flow: Transforms
• May have multiple inputs and
outputs.
• Must have at least one input
• Must have at least one output
• Process rows by buffer
• May output more or less rows
than take on input
•
•
•
•
•
•
•
•
•
Aggregate
Conditional Split
Fuzzy Lookup
Lookup
Merge / Merge Join /
Multicast
Slowly Changing Dimension
Script Component
Sort
Custom Transforms
Agenda
•
•
•
•
•
Overview
Business Intelligence Development Studio
Workflow Features
Dataflow Features
Other Features
Other Features : Platform
• Easily Extensible
–
–
–
–
–
Custom tasks
Custom transforms
Custom Adapters
Custom log providers
Custom foreach
enumerators
– Custom data providers
– Develop in native or
managed code
– COM and Managed
Object models
• Built in Features
– Ships with SQL Server – “In the
box”
– Ships with multiple transforms
• Data Cleaning, Data Mining
• Aggregation, Sort, SCD
– Ships with tools
– Integrated with SQL Agent
– Integrated with Analysis Services
• Flexible
– May be used in a wide array of
applications, not just ETL
Work Flow: Deployment
•
•
Makes Moving Packages Simpler
Includes Miscellaneous files in projects
• Builds self installer deployment
bundle
•
•
Integrated with configurations
Built when project is built in designer
Other Features : Debugging
• Expanded Error, Warning and Information
Reporting
• Breakpoints – Script task and package
breakpoints integrated
• Watch windows
• Data Viewers in the data flow task
• Error, warning and Information view window
• Log event viewing in log window
• All events logged
Other Features : Security
• Encryption - Using encryption to secure packages
or parts of packages
• Sensitive Data Protection - Identifying and
protecting passwords and other sensitive data
• Roles - SQL Server roles to control access to
packages stored in SQL Server. Operator, Limited
User, and Admin.
• Digital Signing - Code signing certificate to ensure
a package hasn't changed
• Agent Integration - Agent subsystems to securely
store and execute packages
Other Features :
Checkpoint Restartability
•
•
•
•
•
Enables recovery from failed state
Saves work at the task level
Variable values saved
Saves to XML temporary file
Checkpoint file removed when package
restarts
• Runtime feature, not dataflow restartability
Other Features : DTExec, DTExecUI,
DTUtil
• DTExec replaces DTSRun
• DTExecUI replaces DTSRunUI and has new UI
• DTUtil new utility for moving packages etc.
Other Features : SSIS Server
• Integrates with SQL Server Management Studio
• Provides package file store
• Enumerates running packages
• Hierarchical
package storage on
file system, Server
Store and in SQL
Server
• Configure with
MsDtsSrvr.ini.xml in
90\DTS\binn
Other Features : Not so obvious
• Variables made safe with the Variable Dispenser
• Scriptable sources and destinations as well as
transformations
• Transaction flexibility and scoping
• Script now compiled for much faster execution
• Agent interaction more customized and
integrated
• Packages saved to XML
• Always uses encrypted connections with
SQL2K5
Help : Upgrade Advisor & Migration
Wizard
• Discovers problems
before installing
• Migrate packages from
DTS 2K only
Demos
Summary
• Integration Services has changed dramatically.
• A platform designed to support embedding and
extension
• Stock feature set enormous and comprehensive
• Exhibit A in Microsoft’s BI Strategy
• True enterprise ETL and Integration capabilities
• Long term support from Microsoft
• We’ve only just begun!
Resources
– Integration Services MSDN Home http://msdn.microsoft.com/SQL/sqlwarehouse/SSIS/defaul
t.aspx
– Articles –
– http://www.windowsitpro.com/Authors/AuthorID/1457/1457
.html
– Junkies Forum http://www.sqljunkies.com/Forums/ShowForum.aspx?Foru
mID=48
– MSDN Forum http://forums.microsoft.com/msdn/ShowForum.aspx?Foru
mID=80
– Community
http://SQLIS.com