SSIS for Absolute Beginners v3 August 2015 GLUGNET
Download
Report
Transcript SSIS for Absolute Beginners v3 August 2015 GLUGNET
August 20, 2015
SSIS for Absolute Beginners
by George SQUILLACE
“SQL” is in the name!
[email protected]
Speaker Background
•
•
•
•
•
22 years of service at New Horizons
96 testing events
MCT since 1997
SQLSaturday speaker
Microsoft Virtual Academy presenter
Husband, Dad
MCT
MCSE
SQL Server 2012
MCSA
MCSE: Data Platform
MCSE: Business Intelligence
Certified in every version of SQL Server from
2000 -> current
Certified in every version of Windows Server from
NT 4.0 -> Server 2008
Almost 30 years in I.T.!
For fun: (no particular order)
Reading
Bicycling (road /city / trail)
Travel
Music / Acoustic Guitar
Computers
Would love to learn how to fly
Would love to learn how to surf
Agenda
What is “ETL” ?
Why Is ETL Needed?
What is “SSIS”?
Tour of the Development Environment
Control Flow Basics
Data Flow Basics
Debugging & Error Handling Basics
Deployment Basics
What is “ETL”,
and Why Is It Needed?
What is “ETL”?
Extract, Transform, & Load
Connect to data sources for Extraction
Connect to data destinations for Loading
Optional: Transform/process data as required
ETL is very frequently required within an organization
Oh, and sometimes an E-L-T process is used instead
Why is “ETL” Needed?
Import and export data
Working with trading partners
Combine data from multiple sources
Homogeneous sources (all SQL Servers)
Heterogeneous sources
Make data available to applications other than the source
Make data available for people
Combining data from various sources for analysis
Create a data warehouse
Cleanse data
Standardize data
A DW is a foundation for a variety of analyses, such as….
interactive analysis with OLAP cubes,
Tabular and other data models
Reporting
What is “The Big Lie”?
What is SQL Server
Integration Services, or
“SSIS”?
What is SSIS?
ETL Execution Platform
Control Flow Engine
Data Flow Engine
Unit of Execution: “Package”
ETL Development
Environment
Extensible
Includes:
Wizards
Import/Export Data Wizard
Package Migration Wizard
Package Configuration Wizard
Package Deployment Wizard
Command Line Executables
DTExec
DTExecUI
DTUtil
How does SSIS relate to SQL
Server?
Why SSIS?
There are competitors…
Commercial
Pretty much every database engine platform has a “data integration” feature
Oracle
IBM
SAP
Informatica
Pentaho
Open source
Talend
CloverETL
SSIS is included in the SQL Server product (depending on the Edition)
Mature feature set (originally introduced in SQL Server 2005)
Enterprise class performance
“We loaded 1TB in 30 Minutes with SSIS, and So Can You”
Huge support community
Websites and help forums
Blogs
Books
BI-specific PASS chapters
Software tools
Development Environment
Visual Studio (SSDT for BI) Basics
Key Terms:
Solution
Project
Template
What is the SSIS
Development Environment?
Visual Studio Shell +
Business Intelligence
Development Templates
Named: SQL Server Data
Tools for BI (SSDT-BI)
Installation method
Licensing
SQL Server 2012 comes
with the Visual Studio
2010 Shell
Can upgrade to SSDT for
BI – VS 2012 or VS 2013
Ironically, there is no
SSDT (yet) that ships with
SQL Server 2014
Can download SSDT for BI
- 2013
Tour of the “Dev” Environment
Control Flow Basics
Control Flow Elements
Tasks
Containers
Precedence Constraints
Frequently Used
Control Flow Tasks
Data Profiling Task
“hunt for treasure and landmines”
Execute SQL Task
File System Task
Execute Process Task
Send Mail Task
Execute Package Task
Script Task
Data Flow Task
The “star of the opera”
Control Flow Containers
Why Containers?
Containing / organizing
Executable unit within the package
Can be enabled/disabled
Two of the three containers provide Looping
Transaction protection context
Think “all, or nothing”
Checkpoint context
Restart point
Three Kinds
Sequence Container
For Loop Container
For Each Loop Container
Precedence Constraints
Precedence Constraints define:
Workflow order
Workflow conditions
Downstream Task & Container Execution can be
based on:
Constraint evaluation
Success
Failure
Completion
Expression
Expression and Constraint
Expression or Constraint
Multiple Constraint evaluation
Logical AND
Logical OR
Data Flow Basics
Data Flow Basics
Various ways to accomplish ETL in SSIS
BULK INSERT Task
Execute SQL Task
bcp.exe
Command line utility for importing and exporting text
files
Most typically used ETL tool in SSIS is the
Data Flow Task
The “DFT” defines a “pipeline”
At least one source
At least one destination
Optional: one or more transformations
Data Flow Sources
Database Engines
OLE-DB Source
ADO .NET Source
ODBC
File sources
Flat file
Excel file
Raw file
XML file
Others
(pretty much anything)
Data Flow Destinations
Database Engines
OLE-DB destination
ADO .NET destination
File destinations
Flat file
Excel file
Raw file
Other destinations
ODBC destination
Recordset destination
…
Data Flow Transformations
Row transformations
Character map
Copy column
Derived column
Import column / Export column
Rowset transformations
Aggregate
Sort
Pivot and Unpivot
Row sampling and Precentage sampling
Split & join transformations
Conditional Split
Lookup
Merge
Merge Join
Multicast
Union All
Auditing transformations
Audit
Row count
BI transformations
Slowly changing dimension
Fuzzy grouping
Fuzzy lookup
Data mining query
Custom transformations
Script component
Control Flow & Data
Flow Demonstration
General Coding & Development Principles
“Always accept pain as early in the development cycle as
possible.”
~ Dave Rodabaugh, BI Developer
“If I have to do something once I’m probably going to have to
do it again.”
Abstraction / Encapsulation
“Write once; run many!”
Modularity / “Think reusable”
What are values that might be
different from Package execution
to execution?
Server / Instance name
Database name
Connection String
Table name
File System Path
File Name
File Extension
Query Parameter Values such as Comparison Dates
Query result set
Q: Where Can Dynamic /
Programmatic Actions Be
Implemented in SSIS?
A: In many places!
Control Flow
Precedence Constraints
ExecuteSQL Tasks (Reading and / or Writing Variables and the use of Parameterized
Queries)
Parameterized queries
Store the whole query string in a variable
Store the output of a query for subsequent processing
For Loop Container
ForEach Loop Container
Script Task code
Data Flow
Connection Manager Expressions
Derived Column Transformation Expressions
Conditional Split Transformation Expressions
Script Component code
Variable & Parameter Values
Package Configurations
Almost Any Property of Almost Every Task and Component
Variables –
A Building Block of Dynamic
Packages
What are Variables? / Why have Variables?
Two Variable Classifications in SSIS
System Variables
User Variables
Creation – HowTo
Variable Scope and “(un)changeability”
Available Data Types
Parameters –
Another Building Block of Dynamic
Packages
What are Parameters? / Why have Parameters?
Applies only to the Project Deployment Model of
development
Two Parameter classifications in SSIS
Project
Package
Details:
Creation – HowTo
Variable Scope and “(un)changeability”
Available Data Types
Expressions –
Creating Derived Property Values
Expressions can be used to derive almost any
property value and make the value dynamic
Can be derived from the combination of:
Column Values
System & User Variable Values
Operators
Variety of functions
Mathematical
String
NULL functions
Date & Time
Type casts
Debugging &
Error Handling Basics
Debugging & Error Handling
Logging
Control Flow debugging features
Breakpoints
Conditional
Also within Script code
Status Windows
Locals
Watch
Event Handler
“Scoped, Conditional, Control Flow”
Data Flow features
Data Viewers
Ignoring or redirecting failed rows
Deployment Basics
Deployment: Dev -> Prod
Starting with SQL Server 2012 there are now two
different deployment models
Project Deployment Model
Requires a special database created, the “SSIS Catalog”
Package Deployment Model
MSDB
File system
Package configurations
Each deployment model has very different
deployment requirements and methods
Using SSIS Programmatically
Building Packages Programmatically
Samples for creating SSIS packages programmatically
EzAPI – Alternative package creation API
Running and Managing Packages Programmatically
Biml
(Business Intelligence Markup Language)
Stairway to Biml Level 1: What is Biml?
Level
Level
Level
Level
2,
3,
4,
5,
Biml Basics
Building an Incremental Load Package
Using Biml as an SSIS Design Patterns Engine
Biml Language Elements
Biml is better even for simple packages
Generate multiple SSIS packages using BIML and
metadata
8 Practical BIML Tips
BimlScript
Extensibility:
Free & Commercial Tools
Codeplex
SSIS Community Tasks and Components (free & commercial)
Tools
i.e., SSIS Expression Editor & Tester
Connection Managers
Log Providers
Control Flow Tasks
ForEach Enumerators
Script Task examples
Script Component examples
Data Flow Sources, Transforms, & Destinations
Pragmatic Works – Task Factory
Abilineage, SSIS Pipeline components
Eldos - BizCrypto, like SFTP functionality for SSIS
CozyRoc - SSIS+, Control Flow & Data Flow components
Keelio - Dynamics GP SSIS Toolkit and XML Toolkit
Konesans - a number of components, some of which are free
Melissa Data
Commercial
References & Resources
Microsoft course 20463C, available through your local New
Horizons Computer Learning Center
Implementing a Data Warehouse with SQL Server (5 days)
www.e-Squillace.com
Books
(linked to my BI homepage)
Implementing a Data Warehouse with Microsoft SQL Server
2012 Training Kit (Exam 70-463)
SQL Server 2008 Integration Services Problem-Design-Solution
(Wrox)
Web & other links:
Integration Services Expression Reference
SSIS Expression Examples
Advanced Integration Services Expressions
Integration Services Data Types
CAST & CONVERT SSIS Data Types
A VB .NET Toolkit for Writing SSIS Script Tasks
SSIS Expression Cheat Sheet at PragmaticWorks
PragmaticWorks free one hour webinars
Thank you!