Easy_ETL_with_Bimlx

Download Report

Transcript Easy_ETL_with_Bimlx

Easy ETL with
12.12.2015
12.12.2015
Thank you to our AWESOME sponsors!
12.12.2015
Andrzej Kukuła









20 years of professional experience in IT
Microsoft Certified Solutions Expert
Biml Expert
System Architect
C#, F# Developer by heart
Leader of local PLSSUG/PASS chapter
Long-time Contributor to pymssql project
Passionate about many aspects of IT world
[email protected]
12.12.2015
Marcin Szeliga






Data Philosopher
BI Expert and Consultant
Data Platform Architect
20 years of experience with SQL Server
Ph.D. Candidate at Politechnika Śląska
[email protected]
12.12.2015
Agenda









The ETL
SSIS and its problems
Introduction to Biml
BimlScript and its features
Code generation described
Interesting functionalities
Digressions
Demos
Summary
12.12.2015
The ETL
 ETL is a significant cost in DW project
 But it‘s not going to disappear tomorrow
 It is still the foundation of many big Enterprise
Data Warehouses
 Thus the goal: simplify ETL to lower cost
 SSIS is very good ETL solution
 but...
12.12.2015
SSIS and its problems
 Manual package creation using visual editor + drag & drop
 Slow, boring, error prone
 Not adaptable to requirements
(think about changing logic in 10/100/… packages)
 Not developer-friendly





Not generic
Version control is hardly possible
No easy API
No code templates/design patterns/DRY
No CI/CD
 Should we really spend
our time fighting this?
12.12.2015
Example Package
12.12.2015
Example DTSX
Is this reasonable
amount of code to
accomplish the task?…
12.12.2015
Enter
<Package Name="Load Person">
<Tasks>
<Dataflow Name="Copy to Stage">
<Transformations>
<OleDbSource Name="Person" ConnectionName="Source">
<TableInput TableName="AdventureWorks2012.Person.Person" />
</OleDbSource>
<DerivedColumns Name="Add LoadDateTime">
<Columns>
<Column Name="LoadDateTime" DataType="DateTime2" Scale="7">
@[System::ContainerStartTime]</Column>
</Columns>
</DerivedColumns>
<OleDbDestination Name="StgPerson" ConnectionName="Stage">
<TableOutput TableName="Stage.AW.Person" />
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
</Package>
12.12.2015
What’s Biml?
 Business Intelligence Markup Language
 A really easy XML-based language to describe BI assets









Connections, Tables, Views
SSIS Packages, SSIS Projects
Dimensions, Measure Groups, Cubes
and more…
Smart default values and behavior
Don’t bother with SSIS metadata (most of the time)
Easy to integrate with VC (TFS, Git, and more)
Easy to organize code into reusable libraries
Available for free in SSDT-BI with BIDSHelper
Demo!
12.12.2015
Example Biml elements
 Familiar and pretty easy to remember, aren’t they?
12.12.2015
Biml and BimlScript
 Biml = XML Domain Specific Language
 BimlScript = Biml + C# code nuggets
 called template
Demo!
12.12.2015
BimlScript
 We have full power of .NET at our disposal
 Execute code during compilation
 Read files
 Access remote data
 Even create database objects (e.g. semi-temporary tables)
 Reference external .NET assemblies
 Supports including and calling other Biml files
 Can reference external .NET assemblies
 Extensive .NET Biml API built-in
12.12.2015
BimlScript Expansion
 First, BimlScript is expanded to Biml
 Then Biml is compiled to DTSX
12.12.2015
Biml Compilation
 Biml is compiled to DTSX + DTProj
 The process is repeatable
 Can be directly opened in SSDT-BI (BIDS)
 Execution using DTExec / SSISDB / SSDT
No additional runtime or runtime license required
to execute package
 Compilation is just automation of SSIS
 The same rules apply
 Objects must exist in database (so that code generator can
create SSIS metadata and mappings)
12.12.2015
The process
• Biml + BimlScript source files
• C#/VB compiler
• BimlScript expansion to Biml
• Population of in-memory Object Model
• Single, in-memory, expanded,
compiled and merged Biml
Code generator
…
12.12.2015
RootNode




The .NET object which represents all assets and metadata
Populated during parsing of Biml files
Useful in BimlScript
Allows creation of very dynamic solution





RootNode.Connections
RootNode.Databases
RootNode.Tables
RootNode.Packages
and many many more
12.12.2015
Code Reuse
 Can be used anywhere in Biml file
 Expands file given in argument and return Biml string
 Can specify additional arguments
 Include (expand) specified Biml file (or .NET resource)
before current file
12.12.2015
Interesting features
 <Annotation></Annotation>
 Lightweight metadata, can be associated with any asset
 Documentation annotations
 Interesting use case: SQL extended properties
 ObjectTag
 Property of all BimlScript .NET objects
 Accessed programmatically from BimlScript
 Very convenient and extensible object storage
12.12.2015
Extending BimlScript object model
 Contains plain C# code (no templating)




Can interact with Biml object model
Can use extended methods
Can define new classes
We can subclass original Biml classes
to change/extend their behavior
12.12.2015
More features
 Transformers and „Frameworks”
 Ability to change Biml after it is already expanded
 Can „reverse engineer” DTSX packages into Biml
 It’s possible to write custom code generators
 Different target runtime than DTSX
12.12.2015
Source object types - digression
 Views in custom schema, e.g. BI.Person
 Isolation
 Doesn’t need to modify production objects
 Can be in different database than production on the same instance
(if absolutely no modification of production is allowed)
 Abstraction





Rename columns
Convert data, create new columns
Model data – e.g. join with other sources
Filter data
Add extended properties
 Security
 Dedicated ETL user can be granted access only to views in this
schema, doesn’t need access to underlying tables
12.12.2015
Separation of duties
 External metadata
 Business analysts don’t touch the packages
 Programmers don’t model the business
 Easy storage
 MDS – Excel client!
 Extended properties
 custom
12.12.2015
Complete demo
















Complete ETL solution
Storage of metadata (connections, mappings) outside of code
Dynamic import of database table definitions
Dynamic generation of stage tables
Annotations and SQL extended properties as a metadata source
Execute SQL during BimlScript expansion
LogicalDisplayFolder as a metadata source
Conditional logic
Type conversion on the fly
Nonstandard column mapping
Master package
Dynamic package project
Dynamic rename of objects during ETL
Package to create stage tables
Tiers
MSBuild
12.12.2015
Demo!
How to benefit it?
 BI Project Decision Makers
 Faster initial delivery
 Lower cost of change
 Immediate Biml/Mist ROI
 BI Architects
 Reusable Design Patterns with Biml/BimlScript
 One project compatible with SQL Server 2005-2014
 Easily manage large BI code base, tasks, and issues
using TFS
 Plan for BI Continuous Integration/Continuous Delivery
12.12.2015
How to benefit it?
 ETL developers
 Fast Biml learning curve
 Generate your DTSX faster with Biml instead of
drag’n’drop
 Embrace DRY in BI development
 Use proper version control to manage your sources
 BI consultants
 Increased productivity
 Build your Biml/BimlScript code library and reuse it in
different projects
 No runtime license costs for customers
12.12.2015
12.12.2015
Thank you to our AWESOME sponsors!
12.12.2015
Questions?
[email protected]
[email protected]
12.12.2015