PowerPoint - Clement Chang Web
Download
Report
Transcript PowerPoint - Clement Chang Web
Data Warehouse Tools and
Technologies - ETL
By:
Issarachevawat, Raynoo
Romieh, Christian
Wongkamolchun, Siri
Zhang, Ying
What Is ETL?
Extract -- the process of reading data from a outer database.
Transform -- the process of converting extracted data to a
form useable by the target database.
– Occurs by using rules or lookup tables or by combining the data with
other data.
Load -- the process of writing the data into the target database.
What does ETL do?
Extracts data from multiple data sources
Migrates data from one DB to another
Converts DB from one format or type to another.
Transforms the data to make it accessible to business
analysis
Forms data marts and data warehouses
Enables loading of multiple target databases
Performs at least three specific functions
– reads data from an input source ;
– passes the stream of information through either an ETL
engine- or code-based process to modify, enhance, or
eliminate data elements based on the instructions of the job;
– writes the resultant data set back out to a flat file, relational
table, etc.
What can ETL be used?
To acquire a temporary subset of data (like a
VIEW) for reports or other purposes.
a more permanent data set may be acquired for
other purposes such as: the population of a
data mart or data warehouse
ETL SYSTEM
Operational Data
OLAP End Users
Local Data Marts
OLAP End Users
Local Data Marts
OLAP End Users
Local Data Marts
OLAP End Users
Local Data Marts
ETL Engine
Outer Sources
Extract
Transform
Load
Filter
Different vendor
Different format
Data Warehouse
Data extracted from the data warehouse
provide faster processing
Issues that are key to an effective ETL tool
Scheduling and job dependencies: particularly relies
on graphical environment.
Session nesting: When developing an ETL session for
a particular part of the system, nesting eliminates
duplicate development.
Robust SQL support: Increases speed over using code
to read and write to a database.
Version management: enables quick roll back rather
than manually making code changes. In many cases,
the DB’s version control may not work on the ETL.
Key Issues … (Cont’d)
Debugging functionality: very useful for developer
support.
ETL should rely on underlying database security.
Transformation capabilities vs. cleansing capabilities:
seldom very strong in both.
Metadata support: must work with the overall
metadata strategy.
Current ETL Market Share
Total Market Share: $667 Million
ETL Evaluation
Throughout the following sections, each of the vendors and their ETL
products are evaluated, focusing on primary differences between such
products.
Ascential Software
Formed in July 2001
Focuses on improving, developing, and perfecting their ETL and
“back-end” tools
Do not have current plans of entering the BI tool market.
The Ascential DataStage product family
– highly scalable ETL solution
– uses end-to-end metadata management and data quality assurance
functions.
– can create and manage scalable, complex data integration for enterprise
applications such as CRM, ERP, SCM, BI/analytics, E-business and data
warehouses.
Cognos Corporation
Founded in 1969
Prefers that all components of the enterprise data
warehouse are Cognos Products
– DecisionStream easily integrates with Cognos BI tools, etc.
– has difficulty integrating with other vendor Products.
DecisionStream is powerful ETL software
– Allows users to extract and unite data from disparate sources
and deliver coordinated Business Intelligence across your
organization.
– includes advanced data merging, aggregation and
transformation capabilities: let users unite data from different
sources, and transform it into information using best-practices
dimensional design.
Informatica PowerConnect
An extension to Informatica PowerCenter, and PowerCenterRT data integration
software.
Eliminates the need for customers to manually code data extraction programs
for their enterprise applications.
Ensures that mission-critical operational data can be effectively used to inform
key business decisions across the enterprise.
Allows companies to directly source and integrate:
ERP
CRM
Real-time message queue
Mainframe
AS/400
Remote data
Metadata
with other enterprise data and deliver it to:
Data warehouses
Operational data stores
Business intelligence tools
Packaged analytic applications.
Conclusion
Issues analyzed:
–
–
–
–
–
development environments
version control
Securities
metadata exchanges standards
Cost
Cognos could not compete
– based on the relative youth
– limitations of ETL tools.
unable to show support for version or revision control
security provided by the underlying database, favors non-Cognos Products.
The ETL tools presented by Ascential and Informatica are comparable
in numerous ways
it would be best to select Informatica as an ETL vendor.
– more mature and stable as a company
– more comprehensive ETL at an efficient price.
Questions?
For Copies of the paper, Please email
Christian Romieh, [email protected]