Transcript here
LOCO Extract – Transform - Load
Installation
Component Structure - Server
From Source
System
To Source
System
Extractor
Transformation
Engine
Including
Translation Engine
And Transformers
Response Translator
Including
Style sheet
Translation
The Response Translator
Is only configured for Source
Systems that require a response
Loader
To Target
System
From Target
System
Pre - Installation Requirements
Loco ETL is a java based application which requires the following to be installed or available prior to installation:
Java 6 SDK or later
Apache Tomcat version 6 or later
A directory into which the ETL will be installed, referred to as the ETL_HOME directory
The application uses a Derby database so an options directory must be made available, referred to as the
OPTIONS_DIRECTORY
During the installation OpenJMS and Active MQ can be selectively installed if required, these will be installed into
the OPTIONS_DIRECTORY along with Derby. These can also be downloaded freely from the various sources via the
links provided on the LOCO website www.loco-etl.com, the download sites also provide links to their documentation.
Installation
Confirm the ETL_HOME and OPTIONS_DIRECTORY exist and are writeable by the operator
Download the zip or tar file using the link provided by LOCO-ETL into a temporary folder and unziip.
Run the install script, this will prompt the operator to enter the two required directories and then install the
application into these two specified directories.
As the Configuration Client is provided as a war file there are a couple of edits that need to be made so:
1.
Copy the etl.war file from the webapp directory to the tomcat/webapps directory
2.
If tomcat is running already, it will explode the etl.war file into a webapps/etl directory, if tomcat is not running it should be
started by running the tomcat/bin/startup script.
3.
Edit the webapps/etl/WEB-INF/classes/log4j.xml file to set the etlFileLogger appender to point to the required log directory.
4.
Edit the webapps/etl/WEB-INF/spring/service.xml file to set the baseDirectory property to be the ETL_HOME and the etlHome
property to be ETL_HOME/mapper.
5.
Restart tomcat.
The Derby configuration database server can be run manually using the scripts provided in the
ETL_HOME/mapper/bin directory but really should be configured by your system administrator to be run
automatically. Before proceeding to the next step, ensure Derby is running.
Configuration
Open a browser window and enter the url http://<hostname>:<port>/etl/index.jsf, this will connect to the ETL
Configuration Client.
Open the Tools/ETL Licence tab and enter the provided licence key into the New ETL Licence Key field and
save.
Open the ETL Configuration/Environment tab, ensure the ETL Home field is correct and select Save, this will
update the directory structure fields.
Before continuing with the configuration the user should be familiar with the Component Structure outlined.
Configuration - Overview
The ETL configuration is encapsulated in an object referred to as a Server, this includes:
Extractor,
Transformation Engine including Transformer objects,
Loader and
Response Translator
The Server object is multi-threaded to achieve maximum performance but take care when determining the number
of threads to use as too many can result in the JVM spending more time switching between threads rather than
actually running them.
Multiple servers can be run simultaneously so for instance, one server can be running as an on-line service handling
financial trades while another instance operates as a batch process loading CSV files from another source system
etc.
If the source system requires a response from the target system, even if its only pass or fail, a Response Translator
should be configured, obviously if no response is required eg. a batch process, then Response Translator does not
need to be configured.
Configuration – Templates and Maps
The first step is to define the data format provided by the source system and required by the target system, to do
this Templates must be created.
Where either data format is XML, this can be created automatically by using the Tools -> Template Import
feature. To use this feature example files must be placed is specific directories ie. ETL_HOME/template/source
or ETL_HOME/template/target. Currently these are set up for Financial trades sub-divided into Asset Classes
however these can easily be changed as required.
Where source system is a database, the template relates to the columns returned by the SQL Query that
extracts the data, refer to the guidelines for more detail but essentially, aliases should be used for the column
names as this allows for the structure of the query to be altered without affecting the template name.
Where the target is a database, the single table column names should be entered into the template.
Where the source or target is a delimited file, the column position and description should be entered into the
template, the description is only used for info, the column position must start from 0.
Where the source or target if a Fixed Field Width file, there is a screen under the Tools -> Configuration
Assistant tab to help with this.
Once the templates have been defined, the next step is to create a Map which is simply identifying the Source and
Target templates and which fields map to each other.
Configuration – Server
The server object encapsulates the ETL process as outlined below so this section brings it all together.
The important thing to note is that the datatype is the key to how data is processed.
Basic
This section sets the basic environment the server runs in and is fairly self explanatory however the Error logging section needs a bit of
an explanation.
The error logging referred to here is in addition to the normal logging and is used to capture failed XML format messages which can be
modified and resubmitted via the Message mechanic which is available from the Tools menu.
Extractor
This section defines the type and connection properties of the Extractor object ie. the object that is the link between the source system
and the ETL process.
Where appropriate eg. when the source is a database, the operator will be prompted to enter the details of the datatype ie.
The processing order, applicable when this server processes multiple datatypes from the same source
The specific datatype
The format of DATE data, in Java style, examples are given in the Guidelines tab
The map to be used for this datatype and
The SQL query filename to be used to extract the data.
Configuration - Server
Loader
This section defines the type and connection properties of the Loader object ie. the object that is the link between the back office (
target ) system and the ETL process. Where the source system requires a response, the Response Translator object is also defined here.
Where appropriate the operator will be prompted to provide details of the datatype ie.
The specific datatype
The format of DATE data, in Java style, examples are given in the Guidelines tab
The name of the file to be generated
When a Response Translator is to be configured, it should be noted that the communication link type can be different from the link
between the Loader object and the Back Office System
Response Translator Stylesheet
When a Response Translator is configured it may or may not require a stylesheet ( Java XSLT ) to be defined, this stylesheet will be used
to translate the response from the back office system format into the format required by the source system, it should be noted that
cpmplex stylesheets can have a performance impact.
Configuration - Server
Datatype Specification
This defines how the datatype of a message is determined. During the configuration the operator will be presented with two different
sets of parameters depending on the type of datasource, these will be covered by the following:
Source Datatype Field - The field in the Source system data that identifies the datatype of the data
Datatype - The datatype used throughout the ETL to identify the correct Map etc
Use Value of Source Datatype Field - If the field exists use the value to identify the identified datatype
Source Datatype Field has Specific Value - If the field exists, does it have a specific value
Source Datatype Field has Value Containing Specific Value - If the field exists, does it contain a specific value
Source Datatype Field does NOT Contain Specific Value - If the field exists, does it NOT contain a specific value
Source Datatype Field exists a specific number of times - If the field exists specific number of times
Source Base Elment does NOT Exists - The source does not contain an element whose Base Name is specified
Source Datatype Field Specific Value or Count or Base Name- If the field exists, the specific value required or the number of times
it exists eg. 1
Data Map - The map used to transform Source format to Target format data
Internal Message ID Field - The field in the Mapped Message that contains the unique data identifier supplied by the Source system
TargetResponse Message ID Field - Where the Source requires a response, this identifies the field in the Recipient system response
containing the unique data identifier
Target Response Datatype Field - Where the Source requires a response, this identifies the field in the Recipient system response
containing the datatype
NOTE: the Base name of an element is the last sub-element ie. given an element with a name of customer.address, the Base name is
address.
Configuration - Server Transformers
The package ships with a selection of Transformers which are used to massage the message as it proceeds
through the ETL process, the available transformers can be accessed from the ETL Configuration -> Transformers
tab.
Each transformer includes a set of parameters which are applicable.
The transformer list includes two database lookups ie. static table data and SQL generated data, both of these
can be created using the Tools -> Configuration Tools -> Lookup screens.
When creating a server configuration, transformers can be allocated to a particular datatype together with
defining the appropriate values for each parameter from the ETL Configuration -> Server screen.
To add a new Transformer, the operator should create a new Java package to add the Transformer class to and
add the jar file to the ETL Classpath. Source code for a transformer will be made available to be used as an
example.
Configuration – Directory Structure
The LOCO – ETL tool uses several pre-defined directories which are sub-directories of the ETL_HOME directory:
Under the Tools tab there is a facility to import Source and Target system XML format data templates, these
templates should be placed in ETL_HOME/template/source ( target ) directory.
The Response Translator uses XSLT stylesheets, these should be placed in ETL_HOME/mapper/xsl.
Database source extractors use SQL queries, these can be developed anywhere but should then be placed in
ETL_HOME/mapper/data/requests directory.