SSIS for Absolute Beginners v3 August 2015 GLUGNET

Download Report

Transcript SSIS for Absolute Beginners v3 August 2015 GLUGNET

August 20, 2015
SSIS for Absolute Beginners
by George SQUILLACE
“SQL” is in the name!
[email protected]
Speaker Background
•
•
•
•
•
22 years of service at New Horizons
96 testing events
MCT since 1997
SQLSaturday speaker
Microsoft Virtual Academy presenter
 Husband, Dad
 MCT
 MCSE
 SQL Server 2012
 MCSA
 MCSE: Data Platform
 MCSE: Business Intelligence
 Certified in every version of SQL Server from
2000 -> current
 Certified in every version of Windows Server from
NT 4.0 -> Server 2008
 Almost 30 years in I.T.!
For fun: (no particular order)
Reading
Bicycling (road /city / trail)
Travel
Music / Acoustic Guitar
Computers
Would love to learn how to fly
Would love to learn how to surf
Agenda
What is “ETL” ?
Why Is ETL Needed?
What is “SSIS”?
Tour of the Development Environment
Control Flow Basics
Data Flow Basics
Debugging & Error Handling Basics
Deployment Basics
What is “ETL”,
and Why Is It Needed?
What is “ETL”?
 Extract, Transform, & Load
 Connect to data sources for Extraction
 Connect to data destinations for Loading
 Optional: Transform/process data as required
 ETL is very frequently required within an organization
 Oh, and sometimes an E-L-T process is used instead 
Why is “ETL” Needed?
 Import and export data
 Working with trading partners
 Combine data from multiple sources
 Homogeneous sources (all SQL Servers)
 Heterogeneous sources




Make data available to applications other than the source
Make data available for people
Combining data from various sources for analysis
Create a data warehouse
Cleanse data
Standardize data
A DW is a foundation for a variety of analyses, such as….
interactive analysis with OLAP cubes,
Tabular and other data models
 Reporting




 What is “The Big Lie”?
What is SQL Server
Integration Services, or
“SSIS”?
What is SSIS?
 ETL Execution Platform
 Control Flow Engine
 Data Flow Engine
 Unit of Execution: “Package”
 ETL Development
Environment
 Extensible
 Includes:
 Wizards




Import/Export Data Wizard
Package Migration Wizard
Package Configuration Wizard
Package Deployment Wizard
 Command Line Executables
 DTExec
 DTExecUI
 DTUtil
 How does SSIS relate to SQL
Server?
Why SSIS?
 There are competitors…
 Commercial
 Pretty much every database engine platform has a “data integration” feature
 Oracle
 IBM
 SAP
 Informatica
 Pentaho
 Open source
 Talend
 CloverETL
 SSIS is included in the SQL Server product (depending on the Edition)
 Mature feature set (originally introduced in SQL Server 2005)
 Enterprise class performance
 “We loaded 1TB in 30 Minutes with SSIS, and So Can You”
 Huge support community





Websites and help forums
Blogs
Books
BI-specific PASS chapters
Software tools
Development Environment
Visual Studio (SSDT for BI) Basics
 Key Terms:
 Solution
 Project
 Template
What is the SSIS
Development Environment?
 Visual Studio Shell +
Business Intelligence
Development Templates
 Named: SQL Server Data
Tools for BI (SSDT-BI)
 Installation method
 Licensing
 SQL Server 2012 comes
with the Visual Studio
2010 Shell
 Can upgrade to SSDT for
BI – VS 2012 or VS 2013
 Ironically, there is no
SSDT (yet) that ships with
SQL Server 2014
 Can download SSDT for BI
- 2013
Tour of the “Dev” Environment
Control Flow Basics
Control Flow Elements
 Tasks
 Containers
 Precedence Constraints
Frequently Used
Control Flow Tasks
 Data Profiling Task
 “hunt for treasure and landmines”
 Execute SQL Task
 File System Task
 Execute Process Task
 Send Mail Task
 Execute Package Task
 Script Task
 Data Flow Task
 The “star of the opera”
Control Flow Containers
 Why Containers?





Containing / organizing
Executable unit within the package
Can be enabled/disabled
Two of the three containers provide Looping
Transaction protection context
 Think “all, or nothing”
 Checkpoint context
 Restart point
 Three Kinds
 Sequence Container
 For Loop Container
 For Each Loop Container
Precedence Constraints
 Precedence Constraints define:
 Workflow order
 Workflow conditions
 Downstream Task & Container Execution can be
based on:
 Constraint evaluation
 Success
 Failure
 Completion
 Expression
 Expression and Constraint
 Expression or Constraint
 Multiple Constraint evaluation
 Logical AND
 Logical OR
Data Flow Basics
Data Flow Basics
 Various ways to accomplish ETL in SSIS
 BULK INSERT Task
 Execute SQL Task
 bcp.exe
 Command line utility for importing and exporting text
files
 Most typically used ETL tool in SSIS is the
Data Flow Task
 The “DFT” defines a “pipeline”
 At least one source
 At least one destination
 Optional: one or more transformations
Data Flow Sources
Database Engines
 OLE-DB Source
 ADO .NET Source
 ODBC
File sources
 Flat file
 Excel file
 Raw file
 XML file
 Others
(pretty much anything)
Data Flow Destinations
Database Engines
 OLE-DB destination
 ADO .NET destination
File destinations
 Flat file
 Excel file
 Raw file
Other destinations
 ODBC destination
 Recordset destination
…
Data Flow Transformations
 Row transformations




Character map
Copy column
Derived column
Import column / Export column
 Rowset transformations




Aggregate
Sort
Pivot and Unpivot
Row sampling and Precentage sampling
 Split & join transformations






Conditional Split
Lookup
Merge
Merge Join
Multicast
Union All
 Auditing transformations
 Audit
 Row count
 BI transformations




Slowly changing dimension
Fuzzy grouping
Fuzzy lookup
Data mining query
 Custom transformations
 Script component
Control Flow & Data
Flow Demonstration
General Coding & Development Principles
 “Always accept pain as early in the development cycle as
possible.”
 ~ Dave Rodabaugh, BI Developer
 “If I have to do something once I’m probably going to have to
do it again.”
 Abstraction / Encapsulation
 “Write once; run many!”
 Modularity / “Think reusable”
What are values that might be
different from Package execution
to execution?
 Server / Instance name
 Database name
 Connection String
 Table name
 File System Path
 File Name
 File Extension
 Query Parameter Values such as Comparison Dates
 Query result set
Q: Where Can Dynamic /
Programmatic Actions Be
Implemented in SSIS?
A: In many places!
 Control Flow
 Precedence Constraints
 ExecuteSQL Tasks (Reading and / or Writing Variables and the use of Parameterized
Queries)
 Parameterized queries
 Store the whole query string in a variable
 Store the output of a query for subsequent processing
 For Loop Container
 ForEach Loop Container
 Script Task code
 Data Flow




Connection Manager Expressions
Derived Column Transformation Expressions
Conditional Split Transformation Expressions
Script Component code
 Variable & Parameter Values
 Package Configurations
 Almost Any Property of Almost Every Task and Component
Variables –
A Building Block of Dynamic
Packages
 What are Variables? / Why have Variables?
 Two Variable Classifications in SSIS
 System Variables
 User Variables
 Creation – HowTo
 Variable Scope and “(un)changeability”
 Available Data Types
Parameters –
Another Building Block of Dynamic
Packages
 What are Parameters? / Why have Parameters?
 Applies only to the Project Deployment Model of
development
 Two Parameter classifications in SSIS
 Project
 Package
 Details:
 Creation – HowTo
 Variable Scope and “(un)changeability”
 Available Data Types
Expressions –
Creating Derived Property Values
 Expressions can be used to derive almost any
property value and make the value dynamic
 Can be derived from the combination of:




Column Values
System & User Variable Values
Operators
Variety of functions





Mathematical
String
NULL functions
Date & Time
Type casts
Debugging &
Error Handling Basics
Debugging & Error Handling
 Logging
 Control Flow debugging features
 Breakpoints
 Conditional
 Also within Script code
 Status Windows
 Locals
 Watch
 Event Handler
 “Scoped, Conditional, Control Flow”
 Data Flow features
 Data Viewers
 Ignoring or redirecting failed rows
Deployment Basics
Deployment: Dev -> Prod
 Starting with SQL Server 2012 there are now two
different deployment models
 Project Deployment Model
 Requires a special database created, the “SSIS Catalog”
 Package Deployment Model
 MSDB
 File system
 Package configurations
 Each deployment model has very different
deployment requirements and methods
Using SSIS Programmatically
 Building Packages Programmatically
 Samples for creating SSIS packages programmatically
 EzAPI – Alternative package creation API
 Running and Managing Packages Programmatically
Biml
(Business Intelligence Markup Language)
 Stairway to Biml Level 1: What is Biml?




Level
Level
Level
Level
2,
3,
4,
5,
Biml Basics
Building an Incremental Load Package
Using Biml as an SSIS Design Patterns Engine
Biml Language Elements
 Biml is better even for simple packages
 Generate multiple SSIS packages using BIML and
metadata
 8 Practical BIML Tips
 BimlScript
Extensibility:
Free & Commercial Tools
 Codeplex
 SSIS Community Tasks and Components (free & commercial)
 Tools
 i.e., SSIS Expression Editor & Tester







Connection Managers
Log Providers
Control Flow Tasks
ForEach Enumerators
Script Task examples
Script Component examples
Data Flow Sources, Transforms, & Destinations







Pragmatic Works – Task Factory
Abilineage, SSIS Pipeline components
Eldos - BizCrypto, like SFTP functionality for SSIS
CozyRoc - SSIS+, Control Flow & Data Flow components
Keelio - Dynamics GP SSIS Toolkit and XML Toolkit
Konesans - a number of components, some of which are free
Melissa Data
 Commercial
References & Resources
 Microsoft course 20463C, available through your local New
Horizons Computer Learning Center
 Implementing a Data Warehouse with SQL Server (5 days)
 www.e-Squillace.com
 Books
(linked to my BI homepage)
 Implementing a Data Warehouse with Microsoft SQL Server
2012 Training Kit (Exam 70-463)
 SQL Server 2008 Integration Services Problem-Design-Solution
(Wrox)
 Web & other links:








Integration Services Expression Reference
SSIS Expression Examples
Advanced Integration Services Expressions
Integration Services Data Types
CAST & CONVERT SSIS Data Types
A VB .NET Toolkit for Writing SSIS Script Tasks
SSIS Expression Cheat Sheet at PragmaticWorks
PragmaticWorks free one hour webinars
Thank you!