ETL - GearBox

Download Report

Transcript ETL - GearBox

More ETL
ETL in a nutshell
• ETL is an abbreviation of the three
words Extract, Transform and Load.
• It is an ETL process to
– extract data, mostly from different types
of systems,
– transform it into a structure that's more
appropriate for reporting and analysis
– finally load it into the database and or
cube(s).
Logical Data Map
• Used to collect and document source
systems to be used for DW
• Should contain the following:
–
–
–
–
–
–
–
–
Target table name
Target column name
Table type
SCD type
Source db
Source table name
Source column name
Transformation
ETL in a nutshell
• Extract from source
• Transform the data
• Load the data into the
datawarehouse
Do not forget
• Data Profiling
• Data Quality Control
• Metadata Management
ETL Tools
1. Oracle Warehouse Builder (OWB)
2. Data Services
3. IBM Information Server (Datastage)
4. SAS Data Integration Studio
5. PowerCenter Informatica 9.0
6. Elixir Repertoire
7.2.2
7. Data Migrator
7.7
8. SQL Server Integration Services (SSIS)
9. Talend Open Studio & Integration Suite
10.DataFlow Manager
6.5
Oracle
SAP Business Objects
IBM
SAS Institute
Informatica
Elixir
Information Builders
Microsoft
Talend
Pitney Bowes
Business Insight
On ETL Tools
• All the ETL vendors say the same thing: “our biggest
competitor is custom code and stored procedures”.
There are still a large number of organizations
building data warehouses without using an ETL tool,
writing their own, mostly very complex, SQL
statements, often difficult to maintain.
• ETL tools can increase productivity by a factor of
three to five. Two of the most important features in
this case are that these tools are completely meta
data driven, and work with an easy to understand
graphical user interface (GUI).
On ETL Tools (WYSWYG)
• Tools with data driven, embedded
data driven approach. They can show
you, at any time you choose, how the
data will be transformed and loaded
into the data warehouse, without
actually loading it. At any time, the
user can press the ‘WYSIWYG’ button,
and the ETL tool shows immediately
what the results of an ETL flow would
be
The Future? EAI and ETL
• What are the similarities between
Enterprise Application Integration and
Extraction Transformation and Loading
tools? Some people (analysts) think
these technologies will converge to
form one product, Enterprise
Information Management. They both
move data, often use the same
metadata, so why shouldn't they
merge into one package?
ETL Tool Selection
• In many organisations there is no software
selection process at all. Sometimes there is a
corporate standard you need to comply to,
or you may have a database or BI tool in
place from a specific vendor, and you
decide to buy an ETL tool from the same
vendor, because you expect that these
will work better together.
• Often organisations find it difficult and labor
intensive to execute a software selection
process in order to gather all the necessary
data.
ETL Tool Selection: From
Company Perspective
• Create data integration strategy
• Define criteria in business terms
• Create a short-list
• Invite vendors for live demonstration
• Perform Proof of Concept
• Negotiate with Vendors
• Close the Deal
Selection Criteria
• Architecture
• ETL Functionality
• Ease-of-Use
• Reusability
• Debugging
• Real-time
• Connectivity
• General ETL tool characteristics
Selection Criteria
• Architecture
– Symmetric Multiprocssing? Grid? etc
• ETL Functionality
– Conditional Splitting? Pivoting? etc
• Ease-of-Use
– Screen design? Task compatibility? etc
• Reusability
– Decompisition? User-defined functions?
• Debugging
– Breakpoints? Validators? etc
• Real-time
– On demand data integration? etc
Long Test
• Date and Venue
• Coverage