Informatica Overview

Download Report

Transcript Informatica Overview

Informatica Overview
Contents
•
•
•
•
•
Introduction
Clients
Server(s)
Source, Target, Repository
Connectivity
What is Informatica?
• Allows you to load data into a centralized
location, such as a datamart or data warehouse.
• ETL Tool
– Extract data from multiple sources
– Transform the data according to business logic
and needs
– Load the transformed data into file and
relational targets
Example
EMPLOYEE
EMP_ID
EMP_DETAILS
EMP_NAME
EMP_NAME
EMP_CITY
EMP_CITY
EMP_STATE
EMP_STATE
EMP_COUNT
RY
EMP_DATE_O
F_JOINING
EMP_COUNTRY
EMP_ID
Transform
Date of
Joining to
Yrs of
Service
YRS_OF_SRV
Data Warehousing
Developer
End User
Metadata
Operational Sources
Extract
Transform
Load
Data Warehouse
Informatica Architecture
 Design Process
 Client Tool Review
– Repository Manager
– Designer
– Server Manager

Informatica’s Architecture
Data Models
Designer
Repository Manager
Server Manager
Targets 1-n
Sources 1-n
PowerPlugs
Repository
Server
Informatica Design Process
2.
1.
3.
Source Def
5.
Mapping
Sessions
4.
Target Def
1. Create Repository
2. Import Source Definitions
3. Create Target Schema
4. Create Mappings
5. Load Data
Informatica Client

Repository Manager – Can view much of the metadata in
the Repository through the Repository Manager.

Designer – Create Source-to-Target mappings that contain
transformation instructions for the Informatica Server.

Server Manager – Create, schedule, and monitor sessions.
You create a session based on a transformation and
schedule it to run on the Informatica Server.
Informatica Client
Repository Manager
Metadata Repository







Information about the data mart system
Catalogs the repository
Directs the server
Contains record of user access
Can be shared
Can be searched and reported
Bridged through Metadata Exchange
Repository Manager
Navigator Window
Analysis Window
Dependency Window
Output Window
Folder Attributes
• FOLDER OWNER - user who serves as focal point for
folder permissions
• PERMISSIONS - rights to read, write, and/or execute
objects in a folder
• SHARED - property that allows you to make shortcuts to
objects in a folder
• SHORTCUT - a dynamic link to an object stored in a
shared folder
• VERSIONS - folder iterations that indicate development
stages
Informatica Client
Designer
Designer Workspace
Open Folder List
Navigator
Workspace
Workbook Tabs
Output Window
Status Bar
Designer Options
General
workspace options
reload objects on open
group source definitions
Tables
columns viewed
column size
object size
object colors
Format
workspace colors
import keys
automatic Source
Qualifier creation
Informatica Client
Server Manager
Server Manager
Navigator
Configure Window
Monitor Window
Output Window
Designer
•
•
•
•
•
Source Analyzer
Warehouse Designer
Transformation Developer
Mapplet Designer
Mapping Designer
Source Analyzer
• Identify the sources used to build the warehouse.
• Create repository definitions for these sources
Analyzing Sources

Relational – Oracle, Sybase, Informix, IBM DB2,
Microsoft SQL Server, and Teradata

File – Fixed and delimited flat file, COBOL file, and XML

Other – Microsoft Excel, Microsoft Access

Extended – PeopleSoft, SAP R/3, Sieble, and IBM
MQSeries (need to purchase additional products for these
sources)

Mainframe – Need to purchase additional products.
Warehouse Designer
• Create relational tables in Target database
• Edit target definitions
• Preview relational target data
Targets
• Relational – Oracle, Sybase, Sybase IQ, Informix, IBM DB2,
Microsoft SQL Server, and Teradata
• File – Fixed and delimited flat files and XML
• Extended – SAP BW, IBM MQ Series (need to purchase
additional products for these targets)
• Other - Microsoft Access
Mixing Sources and Targets
• You can combine data from different platforms and
source types.
Oracle
Sybase
Flat
File
Transformation Developer
• Generates ,modifies, passes data through ports
• 12 objects for transforming data
Transformations Types
• Source Qualifier represents all data queried from the
source
• Normalizer normalizes records from VSAM or
relational sources
• Expression performs simple calculations
• Filter serves as a conditional filter
• Aggregator performs aggregate calculations
• Rank limits records to top or bottom range
Transformations Types contd…
• Update Strategy allows for logic to insert, update,
delete, or reject data
• Lookup looks up values and passes to other
objects
• Stored Procedure calls a stored procedure and
captures return values
• External Procedure calls a procedure defined in a
shared library
• Sequence Generator generates unique ID values
• Joiner allows for heterogeneous joins
SourceQualifier Transformation
• Represents records that Informatica server reads
when it runs a session
• Automatically attached when a Source is added to a
mapping
Use a Source Qualifier to:

Filter Records when the Informatica Server reads
source data

Specify sorted ports
– Order by clause

Select only distinct values from a source

Create a custom query for the Informatica Server
to read source data
Expression Transformation
Calculate values in a single row
– Adjust employee salaries, concatenate first and
last names, convert string to number…
 Perform Any Non-Aggregate Calculations
 Test conditional statements before you output to
target

Example
EMPLOYEE
EMP_SALARY
EMP_ID
EMP_ID
EMP_NAME
EMP_NAME
ROLE_CODE
ROLE_CODE
BASIC_SALA
RY
Gross
Salary=
Basic
Salary *
3.5
GROSS_SALARY
Aggregator Transformation
Allows you to perform aggregate calculations,
such as averages and sums
 While the Expression is on a row-by-row basis,
the aggregator can perform calculations on
groups

Example
REVENUE
PU_REVNUE
PU_CODE
PU_CODE
PROJECT_CO
DE
REVENUE
Aggregator
Transformat
ion
MAX_REVENUE
MIN_REVENUE
AVG_REVENUE
Max Revenue =
Max (Revenue)
Min Revenue =
Min(Revenue) Avg
Revenue = Avg
(Revenue)
Filter Transformation

Provides the means for filtering rows in a
mapping
– Employees who are currently working in the
project “NML” of “WENA” as “SE”

Only rows that meet the condition pass
through the mapping.
Filter Transformation

All ports are input/output

Returns TRUE or FALSE for each row passed
through the mapping based on the condition

Discarded rows do not appear in the session log
or reject files

The input ports must only come from one
transformation
Filter vs Source Qualifier (SQ)

SQ provides better performance

SQ only lets you filter rows from relational
sources, Filter Transformation filters rows from
any source

SQ only uses standard SQL, Filter can use any
statement or function that returns True/False
Example
EMPLOYEE
NM_EMP_DETAILS
EMP_ID
EMP_ID
EMP_NAME
EMP_NAME
PROJECT_CO
DE
Filter
Transform
ation
PU_CODE
ROLE_CODE
Where Project =
“NML” and PU =
“WENA” and Role
= “SE”
PROJECT_CODE
PU_CODE
ROLE_CODE
Router Transformation

Groups data into many groups

Routes rows of data that do not meet any
condition to a default group

Can enter any expression that returns a single
value

Condition returns True or False for each row

If the condition = NULL, the row is assumed as
FALSE
Router Transformation
• One Group can be connected to One
transformation or target
• One Output Port can be connected to multiple
transformations or targets
• Multiple Output ports in One Group can be
connected to multiple transformations or
targets
• CANNOT Connect more than One Group to One
Transformation or Target
Lookup Transformation

Looks up data in a relational table
– Can be the Source, Target, or any database that the
Informatica Client and Server can connect to
– Lookup table can be a single table or can join multiple
tables

Lookups can:
– Get a related value (your source include Employee_ID
and you want Employee_Name),
– Perform a calculation
– Update a slowly changing dimension table (check if
records exist on a target)
Lookup Transformation

For each input row, the Informatica Server queries
the lookup table based on the lookup ports and
the condition in the transformation
– The Informatica Server can return values from that
lookup (static cache)
OR
– The Informatica Server inserts a row into the cache to
flag rows as new or existing (dynamic cache)
Connected and Unconnected
Lookup Transformations
CONNECTED LOOKUP
Part of the mapping data flow
Returns multiple values (by
linking output ports to another
transformation)
Executed for every record
passing through the
transformation
More visible, shows where the
lookup values are used
Default values are used
UNCONNECTED LOOKUP
Separate from the mapping data flow
Returns one value (by checking the
Return (R) port option for the output
port that provides the return value)
Only executed when the lookup
function is called
Less visible, as the lookup is called
from an expression within another
transformation
Default values are ignored
Example
EMPLOYEE_PROJECT
EMP_ID
NM_EMP_DETAILS
EMP_ID
EMP_NAME
EMP_NAME
PROJECT_CO
DE
PROJECT_CODE
PROJECT
PROJECT_CO
DE
PROJECT_DE
SC
LOOK UP
Transform
ation
PROJECT_DESC
Get
PROJECT.PROJECT_DESC
Where
PROJECT.PROJECT_CODE
=
NM_EMP_DETAILS.PROJE
CT_CODE
Update Strategy

Two Ways Of doing

Within a Session

Within a Mapping
Update Strategy

Within a Session


Instruct the Informatica Server how to treat
the rows when the session is configured
Within a Mapping
 Use the update strategy transformation to
flag records for insert, delete, update, or
reject.
Constraint for each Database
Operation
Operation
Insert
Constant
DD_INSERT
Numeric Value
0
Update
DD_UPDATE 1
Delete
DD_DELETE 2
Reject
DD_REJECT 3
Joiner Transformation
 Active Transformation
Join two flat files
Join two tables from different databases
Join a flat file with a relational table
Transformation Overview
Three views:
 Iconized View -- shows transformation in relation to mapping
 Normal View -- shows data flow through transformation
 Edit View -- shows transformation properties and allows for
editing
Transformation Overview
Normal view shows data flow through the
transformation
Data passes
through
I/O ports
unchanged
 DATE_ENTERED
passes into transformation
through an input port.
 It is used in MTH port to extract month.
 Month is passed through MTH output port to
another transformation.
Transformation Overview
Edit view provides flexibility in defining transformation rules
Define port level handling
-Switch between
transformation
s
-Enter comments
-Make reusable
Define transformation level
properties
Transformations and Expressions
Calculation or conditional statement
Used in Expression, Aggregator, Rank, Filter, Update
Strategy
Performs calculation based on ports, functions,
operators, variables, literals, constants, and return
values from other transformations
Mapplets
•
•
•
•
Reusable Object
Include multiple transformations
Include Source definitions
Multiple groups of output ports
Mapping
Move and transform data from sources to targets
 Includes
 source definitions
 target definitions
 transformations.

Source
Transformations
Mapping
Target
Mapping Designer
Transformation Toolbar
Mapping List
Iconized
Status Bar
Validation
 Three different levels of validation:
 Connection validation
 Expression validation
 Mapping validation
Connection Validation
 Connecting ports with mismatched datatypes
 Connecting output ports to a source
 Connecting a source to anything but a Source
Qualifier or Normalizer
 Connecting an output to a output, or an input
to a input
 Connecting more than one active
transformation to another transformation
 Copying columns to a target definition
Expression Validation
 Parse the current expression, with remote port
searching (references to a port in another
transformation are resolved)
 Parse expression attributes such as filter
condition, lookup condition, SQL Query, etc.
 Parse default values
Mapping Validation

Mapping validation will take place with menu
commands:
 Mapping | Validate
 Repository | Save

Mapping validation will:
 Perform connection validation
 Perform expression validation
 Check the mapping flow validation
 Data from Source Qualifier mapped to a target
 Targets are connected to transformations
Informatica Server
• Reads information from the Repository
• Extracts data from the Sources and stores the data in
memory while it applies the transformation rules you
created
• Loads the transformed data into the mapping targets
Transformation Process
Repository
Session Metadata
Source Def
Mapping
Target Def
Session
Server Manager
source information
target information
mapping
scheduling
error handling
pre- / post-session scripts
tuning parameters
output log information
transformation overrides
Server
Targets
Sources
Definitions
• Session - A set of instructions that tells the
Informatica Server how and when to move data from
sources to targets
• Batch - A group of Sessions which are to be run
together
Server Manager
Navigator
Configure Window
Monitor Window
Output Window
Process






Configure server
Create session
Run session
Monitor session
View logs
Tune session
Server Configuration
Server Variables






Establish default directories for files and
caches.
Variables are server specific.
Allows for easy deployment.
Can be overridden at session level.
Changing the variable updates sessions.
Directories must exist prior to session
launch.
Server Output


Control

.dat
.perf

.idx

Target
Data
E-mail

.bad

.log
Error
Log
Event
Log
Source Settings - Session Wizard
Source
 Select source type
 File, Relational, Heterogeneous
Source Database
 Source Type: Relational
 Database connection
Treat rows as:
 Source Type: Relational, File
 Tells server how to treat source rows
 Insert, update, delete, data driven
 Works in conjunction with Target Options
Source Options...
 Source Type: Relational
 Database name
 Source Type: File
 fixed / delimited properties
 file list
 FTP properties
Target Settings - Session Wizard
Target
 Select target type
 File, Relational
Target Database
 Target Type: Relational
 Database connection
Flat File Options:
 File properties
 FTP properties
 Loader properties
Target Options:
 Target type: Relational
 Tell server how to load target
 Insert, Update Delete
 Truncate target
 Bulk, Normal, Test
Launching Sessions

Manual start - Manually launch a session
from within the Server Manager

Session Schedule - Schedule a session using
business cycle start, stop, and repeat intervals

Batching - Use batches to run session
concurrently or sequentially

Event based - Configure a session to launch
based upon the appearance of an indicator file

Command Line - Launch a session from the
command line prompt
After the Session Launches...






Poll/Refresh Session Status
View Session Details
View Performance Monitor
View Logs
Tune the Session
Correct Session Problems
Monitor Session
 Select Server Requests | Session Details
 Number of rows loaded/failed
 Read/Write throughput
 Most current Server message
 Audit trail in repository or session log
Monitor Session
Performance Monitor
 Select Server Requests | Session Performance
Details or
open file <sessionname>.perf
 Need to configure session properties to save the
performance detail counters
 Help determine where session performance can
be improved
Log Files - Session Wizard
Log Files...
 Log file path and name
 Reject file path and name
 Session log archive options
Connectivity Overview