OGSA DAI Data Access and Integration

Download Report

Transcript OGSA DAI Data Access and Integration

Enabling Grids for E-sciencE
OGSA DAI
Data Access and Integration
Marek Ciglan
Institute of Informatics, Slovac Academy of Sciences
www.eu-egee.org
INFSO-RI-508833
Motivation
Enabling Grids for E-sciencE
• Different users / applications store data in different
formats
– Plain files
– XML databases
– Relational Databases




PostgreSQL
Oracle
DB2
MySql
• Difficult to work with a lot of different data formats
• Difficult to integrate data from heterogeneous
resources
INFSO-RI-508833
Grid Application Development, Bratislava, 10.03.05
2
OGSA DAI - Overview
Enabling Grids for E-sciencE
• Allow different types of data models
– Files
– XML databases
– Relational Databases
• Allow data to be accessed through uniform interfaces
• Provide extensible framework for integrating data
resources on the Grids
• Allow metadata about data and the data resources in
which they are stored to be obtained
• Facilitate the integration of data from various sources
to obtain the required information
INFSO-RI-508833
Grid Application Development, Bratislava, 10.03.05
3
Architecture
Enabling Grids for E-sciencE
INFSO-RI-508833
Grid Application Development, Bratislava, 10.03.05
4
Data Resource Activities
Enabling Grids for E-sciencE
• Relational Activities
– Run an SQL query statement
– Run an SQL update statement
– …
• XML Activities
– Run an XPath statement against an XML database
– Run an XUpdate statement against an XML database
– …
• File Activities
–
–
–
–
Access a directory
Read data from a file
Manipulate files in a directory
Write data into a file
INFSO-RI-508833
Grid Application Development, Bratislava, 10.03.05
5
Delivery Activities
Enabling Grids for E-sciencE
•
•
•
•
•
•
Retrieve data from a URL
Deliver data to a URL
Deliver data to a GridFTP server
Retrieve data from a GridFTP server
Deliver results to a stream
…
INFSO-RI-508833
Grid Application Development, Bratislava, 10.03.05
6
Transformation Activities
Enabling Grids for E-sciencE
•
•
•
•
•
ZIP compress the results
GNU-ZIP compress the results
GNU-ZIP decompress results
Transform data using an XSLT
Break a single block into multiple blocks based on a
set of separator characters
• Aggregate multiple blocks into a single block
INFSO-RI-508833
Grid Application Development, Bratislava, 10.03.05
7
Data integration
Enabling Grids for E-sciencE
MySql
XML database
PostgreSQL
Text File
Oracle
Data Warehouse
INFSO-RI-508833
Grid Application Development, Bratislava, 10.03.05
8
Data integration
Enabling Grids for E-sciencE
MySql
XML database
PostgreSQL
Text File
How to integrate all those
heterogeneous data into central data
warehouse ?
Oracle
Data Warehouse
INFSO-RI-508833
Grid Application Development, Bratislava, 10.03.05
9
Data integration
Enabling Grids for E-sciencE
OGSA - DAI
MySql
XML database
PostgreSQL
Text File
Oracle
Data Warehouse
INFSO-RI-508833
Grid Application Development, Bratislava, 10.03.05
10
Data integration
Enabling Grids for E-sciencE
OGSA - DAI
MySql
XML database
PostgreSQL
Text File
Select data
Write data into file
Compress file
Transfer zip file
Oracle
Data Warehouse
INFSO-RI-508833
Grid Application Development, Bratislava, 10.03.05
11
Data integration
Enabling Grids for E-sciencE
OGSA - DAI
MySql
XML database
PostgreSQL
Text File
Select data
Read subset of file
Write data into file
Transform
Compress file
Compress file
Transfer zip file
Transfer zip file
Oracle
Data Warehouse
INFSO-RI-508833
Grid Application Development, Bratislava, 10.03.05
12
Data integration
Enabling Grids for E-sciencE
OGSA - DAI
MySql
XML database
PostgreSQL
Text File
Select data
Read subset of file
Select data
Read subset of file
Write data into file
XLST Transform
Write data into file
Transform
Compress file
Compress file
Compress file
Compress file
Transfer zip file
Transfer zip file
Transfer zip file
Transfer zip file
Oracle
Data Warehouse
INFSO-RI-508833
Grid Application Development, Bratislava, 10.03.05
13
Data integration
Enabling Grids for E-sciencE
• How to perform data integration ?
– Write specialized Java application for data integration
– Use OGSA-DAI perform documents
• Perform Documents
– XML documents
– Describe activities to be performed
<sqlQueryStatement name="myQuery">
<expression>
select * from littleblackbook where id=10
</expression>
<webRowSetStream name="myQueryOutput"/>
</sqlQueryStatement>
INFSO-RI-508833
Grid Application Development, Bratislava, 10.03.05
14
Perform documents
Enabling Grids for E-sciencE
• Activities integration with perform documents
<sqlQueryStatement name="myQuery">
<expression>
select * from littleblackbook where id<100
</expression>
<webRowSetStream name="myQueryOutput"/>
</sqlQueryStatement>
<deliverToGDT name="deliverQueryResults">
<fromLocal from="myQueryOutput"/>
<toGDT streamId="otherServiceInput" mode="full">
http://localhost:8080/ogsa/services/ogsadai/SomeDAIService
</toGDT>
</deliverToGDT>
INFSO-RI-508833
Grid Application Development, Bratislava, 10.03.05
15
Data Security
Enabling Grids for E-sciencE
• Role mapping is the process of authorizing a client's
request to access a data resource
• two-step process:
– Check whether the client is allowed to access the data resource
– Determine the database user name and password (or role) to be
used for this client
• A role map document contains the information required
to undertake this process
INFSO-RI-508833
Grid Application Development, Bratislava, 10.03.05
16
Data Security
Enabling Grids for E-sciencE
• Simple OGSA-DAI Role Map Documents
<DatabaseRoles>
<Database name="jdbc:mysql://host:6502/otherData">
<User dn="No Certificate Provided"
userid="myUser" password="123"/>
<User dn="/C=UK/O=eScience/OU=Aspatria/L=AeSC/CN=tom“
userid="superUser" password="myPassword"/>
</Database>
</DatabaseRoles>
INFSO-RI-508833
Grid Application Development, Bratislava, 10.03.05
17
The End 
Enabling Grids for E-sciencE
Thank you for your attention.
INFSO-RI-508833
Grid Application Development, Bratislava, 10.03.05
18