Informatica Overview
Download
Report
Transcript Informatica Overview
Informatica Overview
Contents
•
•
•
•
•
Introduction
Clients
Server(s)
Source, Target, Repository
Connectivity
What is Informatica?
• Allows you to load data into a centralized
location, such as a datamart or data warehouse.
• ETL Tool
– Extract data from multiple sources
– Transform the data according to business logic
and needs
– Load the transformed data into file and
relational targets
Example
EMPLOYEE
EMP_ID
EMP_DETAILS
EMP_NAME
EMP_NAME
EMP_CITY
EMP_CITY
EMP_STATE
EMP_STATE
EMP_COUNT
RY
EMP_DATE_O
F_JOINING
EMP_COUNTRY
EMP_ID
Transform
Date of
Joining to
Yrs of
Service
YRS_OF_SRV
Data Warehousing
Developer
End User
Metadata
Operational Sources
Extract
Transform
Load
Data Warehouse
Informatica Architecture
Design Process
Client Tool Review
– Repository Manager
– Designer
– Server Manager
Informatica’s Architecture
Data Models
Designer
Repository Manager
Server Manager
Targets 1-n
Sources 1-n
PowerPlugs
Repository
Server
Informatica Design Process
2.
1.
3.
Source Def
5.
Mapping
Sessions
4.
Target Def
1. Create Repository
2. Import Source Definitions
3. Create Target Schema
4. Create Mappings
5. Load Data
Informatica Client
Repository Manager – Can view much of the metadata in
the Repository through the Repository Manager.
Designer – Create Source-to-Target mappings that contain
transformation instructions for the Informatica Server.
Server Manager – Create, schedule, and monitor sessions.
You create a session based on a transformation and
schedule it to run on the Informatica Server.
Informatica Client
Repository Manager
Metadata Repository
Information about the data mart system
Catalogs the repository
Directs the server
Contains record of user access
Can be shared
Can be searched and reported
Bridged through Metadata Exchange
Repository Manager
Navigator Window
Analysis Window
Dependency Window
Output Window
Folder Attributes
• FOLDER OWNER - user who serves as focal point for
folder permissions
• PERMISSIONS - rights to read, write, and/or execute
objects in a folder
• SHARED - property that allows you to make shortcuts to
objects in a folder
• SHORTCUT - a dynamic link to an object stored in a
shared folder
• VERSIONS - folder iterations that indicate development
stages
Informatica Client
Designer
Designer Workspace
Open Folder List
Navigator
Workspace
Workbook Tabs
Output Window
Status Bar
Designer Options
General
workspace options
reload objects on open
group source definitions
Tables
columns viewed
column size
object size
object colors
Format
workspace colors
import keys
automatic Source
Qualifier creation
Informatica Client
Server Manager
Server Manager
Navigator
Configure Window
Monitor Window
Output Window
Designer
•
•
•
•
•
Source Analyzer
Warehouse Designer
Transformation Developer
Mapplet Designer
Mapping Designer
Source Analyzer
• Identify the sources used to build the warehouse.
• Create repository definitions for these sources
Analyzing Sources
Relational – Oracle, Sybase, Informix, IBM DB2,
Microsoft SQL Server, and Teradata
File – Fixed and delimited flat file, COBOL file, and XML
Other – Microsoft Excel, Microsoft Access
Extended – PeopleSoft, SAP R/3, Sieble, and IBM
MQSeries (need to purchase additional products for these
sources)
Mainframe – Need to purchase additional products.
Warehouse Designer
• Create relational tables in Target database
• Edit target definitions
• Preview relational target data
Targets
• Relational – Oracle, Sybase, Sybase IQ, Informix, IBM DB2,
Microsoft SQL Server, and Teradata
• File – Fixed and delimited flat files and XML
• Extended – SAP BW, IBM MQ Series (need to purchase
additional products for these targets)
• Other - Microsoft Access
Mixing Sources and Targets
• You can combine data from different platforms and
source types.
Oracle
Sybase
Flat
File
Transformation Developer
• Generates ,modifies, passes data through ports
• 12 objects for transforming data
Transformations Types
• Source Qualifier represents all data queried from the
source
• Normalizer normalizes records from VSAM or
relational sources
• Expression performs simple calculations
• Filter serves as a conditional filter
• Aggregator performs aggregate calculations
• Rank limits records to top or bottom range
Transformations Types contd…
• Update Strategy allows for logic to insert, update,
delete, or reject data
• Lookup looks up values and passes to other
objects
• Stored Procedure calls a stored procedure and
captures return values
• External Procedure calls a procedure defined in a
shared library
• Sequence Generator generates unique ID values
• Joiner allows for heterogeneous joins
SourceQualifier Transformation
• Represents records that Informatica server reads
when it runs a session
• Automatically attached when a Source is added to a
mapping
Use a Source Qualifier to:
Filter Records when the Informatica Server reads
source data
Specify sorted ports
– Order by clause
Select only distinct values from a source
Create a custom query for the Informatica Server
to read source data
Expression Transformation
Calculate values in a single row
– Adjust employee salaries, concatenate first and
last names, convert string to number…
Perform Any Non-Aggregate Calculations
Test conditional statements before you output to
target
Example
EMPLOYEE
EMP_SALARY
EMP_ID
EMP_ID
EMP_NAME
EMP_NAME
ROLE_CODE
ROLE_CODE
BASIC_SALA
RY
Gross
Salary=
Basic
Salary *
3.5
GROSS_SALARY
Aggregator Transformation
Allows you to perform aggregate calculations,
such as averages and sums
While the Expression is on a row-by-row basis,
the aggregator can perform calculations on
groups
Example
REVENUE
PU_REVNUE
PU_CODE
PU_CODE
PROJECT_CO
DE
REVENUE
Aggregator
Transformat
ion
MAX_REVENUE
MIN_REVENUE
AVG_REVENUE
Max Revenue =
Max (Revenue)
Min Revenue =
Min(Revenue) Avg
Revenue = Avg
(Revenue)
Filter Transformation
Provides the means for filtering rows in a
mapping
– Employees who are currently working in the
project “NML” of “WENA” as “SE”
Only rows that meet the condition pass
through the mapping.
Filter Transformation
All ports are input/output
Returns TRUE or FALSE for each row passed
through the mapping based on the condition
Discarded rows do not appear in the session log
or reject files
The input ports must only come from one
transformation
Filter vs Source Qualifier (SQ)
SQ provides better performance
SQ only lets you filter rows from relational
sources, Filter Transformation filters rows from
any source
SQ only uses standard SQL, Filter can use any
statement or function that returns True/False
Example
EMPLOYEE
NM_EMP_DETAILS
EMP_ID
EMP_ID
EMP_NAME
EMP_NAME
PROJECT_CO
DE
Filter
Transform
ation
PU_CODE
ROLE_CODE
Where Project =
“NML” and PU =
“WENA” and Role
= “SE”
PROJECT_CODE
PU_CODE
ROLE_CODE
Router Transformation
Groups data into many groups
Routes rows of data that do not meet any
condition to a default group
Can enter any expression that returns a single
value
Condition returns True or False for each row
If the condition = NULL, the row is assumed as
FALSE
Router Transformation
• One Group can be connected to One
transformation or target
• One Output Port can be connected to multiple
transformations or targets
• Multiple Output ports in One Group can be
connected to multiple transformations or
targets
• CANNOT Connect more than One Group to One
Transformation or Target
Lookup Transformation
Looks up data in a relational table
– Can be the Source, Target, or any database that the
Informatica Client and Server can connect to
– Lookup table can be a single table or can join multiple
tables
Lookups can:
– Get a related value (your source include Employee_ID
and you want Employee_Name),
– Perform a calculation
– Update a slowly changing dimension table (check if
records exist on a target)
Lookup Transformation
For each input row, the Informatica Server queries
the lookup table based on the lookup ports and
the condition in the transformation
– The Informatica Server can return values from that
lookup (static cache)
OR
– The Informatica Server inserts a row into the cache to
flag rows as new or existing (dynamic cache)
Connected and Unconnected
Lookup Transformations
CONNECTED LOOKUP
Part of the mapping data flow
Returns multiple values (by
linking output ports to another
transformation)
Executed for every record
passing through the
transformation
More visible, shows where the
lookup values are used
Default values are used
UNCONNECTED LOOKUP
Separate from the mapping data flow
Returns one value (by checking the
Return (R) port option for the output
port that provides the return value)
Only executed when the lookup
function is called
Less visible, as the lookup is called
from an expression within another
transformation
Default values are ignored
Example
EMPLOYEE_PROJECT
EMP_ID
NM_EMP_DETAILS
EMP_ID
EMP_NAME
EMP_NAME
PROJECT_CO
DE
PROJECT_CODE
PROJECT
PROJECT_CO
DE
PROJECT_DE
SC
LOOK UP
Transform
ation
PROJECT_DESC
Get
PROJECT.PROJECT_DESC
Where
PROJECT.PROJECT_CODE
=
NM_EMP_DETAILS.PROJE
CT_CODE
Update Strategy
Two Ways Of doing
Within a Session
Within a Mapping
Update Strategy
Within a Session
Instruct the Informatica Server how to treat
the rows when the session is configured
Within a Mapping
Use the update strategy transformation to
flag records for insert, delete, update, or
reject.
Constraint for each Database
Operation
Operation
Insert
Constant
DD_INSERT
Numeric Value
0
Update
DD_UPDATE 1
Delete
DD_DELETE 2
Reject
DD_REJECT 3
Joiner Transformation
Active Transformation
Join two flat files
Join two tables from different databases
Join a flat file with a relational table
Transformation Overview
Three views:
Iconized View -- shows transformation in relation to mapping
Normal View -- shows data flow through transformation
Edit View -- shows transformation properties and allows for
editing
Transformation Overview
Normal view shows data flow through the
transformation
Data passes
through
I/O ports
unchanged
DATE_ENTERED
passes into transformation
through an input port.
It is used in MTH port to extract month.
Month is passed through MTH output port to
another transformation.
Transformation Overview
Edit view provides flexibility in defining transformation rules
Define port level handling
-Switch between
transformation
s
-Enter comments
-Make reusable
Define transformation level
properties
Transformations and Expressions
Calculation or conditional statement
Used in Expression, Aggregator, Rank, Filter, Update
Strategy
Performs calculation based on ports, functions,
operators, variables, literals, constants, and return
values from other transformations
Mapplets
•
•
•
•
Reusable Object
Include multiple transformations
Include Source definitions
Multiple groups of output ports
Mapping
Move and transform data from sources to targets
Includes
source definitions
target definitions
transformations.
Source
Transformations
Mapping
Target
Mapping Designer
Transformation Toolbar
Mapping List
Iconized
Status Bar
Validation
Three different levels of validation:
Connection validation
Expression validation
Mapping validation
Connection Validation
Connecting ports with mismatched datatypes
Connecting output ports to a source
Connecting a source to anything but a Source
Qualifier or Normalizer
Connecting an output to a output, or an input
to a input
Connecting more than one active
transformation to another transformation
Copying columns to a target definition
Expression Validation
Parse the current expression, with remote port
searching (references to a port in another
transformation are resolved)
Parse expression attributes such as filter
condition, lookup condition, SQL Query, etc.
Parse default values
Mapping Validation
Mapping validation will take place with menu
commands:
Mapping | Validate
Repository | Save
Mapping validation will:
Perform connection validation
Perform expression validation
Check the mapping flow validation
Data from Source Qualifier mapped to a target
Targets are connected to transformations
Informatica Server
• Reads information from the Repository
• Extracts data from the Sources and stores the data in
memory while it applies the transformation rules you
created
• Loads the transformed data into the mapping targets
Transformation Process
Repository
Session Metadata
Source Def
Mapping
Target Def
Session
Server Manager
source information
target information
mapping
scheduling
error handling
pre- / post-session scripts
tuning parameters
output log information
transformation overrides
Server
Targets
Sources
Definitions
• Session - A set of instructions that tells the
Informatica Server how and when to move data from
sources to targets
• Batch - A group of Sessions which are to be run
together
Server Manager
Navigator
Configure Window
Monitor Window
Output Window
Process
Configure server
Create session
Run session
Monitor session
View logs
Tune session
Server Configuration
Server Variables
Establish default directories for files and
caches.
Variables are server specific.
Allows for easy deployment.
Can be overridden at session level.
Changing the variable updates sessions.
Directories must exist prior to session
launch.
Server Output
Control
.dat
.perf
.idx
Target
Data
E-mail
.bad
.log
Error
Log
Event
Log
Source Settings - Session Wizard
Source
Select source type
File, Relational, Heterogeneous
Source Database
Source Type: Relational
Database connection
Treat rows as:
Source Type: Relational, File
Tells server how to treat source rows
Insert, update, delete, data driven
Works in conjunction with Target Options
Source Options...
Source Type: Relational
Database name
Source Type: File
fixed / delimited properties
file list
FTP properties
Target Settings - Session Wizard
Target
Select target type
File, Relational
Target Database
Target Type: Relational
Database connection
Flat File Options:
File properties
FTP properties
Loader properties
Target Options:
Target type: Relational
Tell server how to load target
Insert, Update Delete
Truncate target
Bulk, Normal, Test
Launching Sessions
Manual start - Manually launch a session
from within the Server Manager
Session Schedule - Schedule a session using
business cycle start, stop, and repeat intervals
Batching - Use batches to run session
concurrently or sequentially
Event based - Configure a session to launch
based upon the appearance of an indicator file
Command Line - Launch a session from the
command line prompt
After the Session Launches...
Poll/Refresh Session Status
View Session Details
View Performance Monitor
View Logs
Tune the Session
Correct Session Problems
Monitor Session
Select Server Requests | Session Details
Number of rows loaded/failed
Read/Write throughput
Most current Server message
Audit trail in repository or session log
Monitor Session
Performance Monitor
Select Server Requests | Session Performance
Details or
open file <sessionname>.perf
Need to configure session properties to save the
performance detail counters
Help determine where session performance can
be improved
Log Files - Session Wizard
Log Files...
Log file path and name
Reject file path and name
Session log archive options
Connectivity Overview