Neil Chue Hong - National e

Download Report

Transcript Neil Chue Hong - National e

OGSA-DAI
Data Access and Integration for the Grid
Neil Chue Hong
[email protected]
http://www.ogsadai.org.uk
Overview
Motivation
Goals
Partners
Features
Projects
Further information
Overview and demo of FirstDIG/INWA
http://www.ogsadai.org.uk
2
OGSA-DAI Motivation
 Entering an age of data
– Data Explosion
• CERN: LHC will generate 1GB/s = 10PB/y
• VLBA (NRAO) generates 1GB/s today
• Pixar generate 100 TB/Movie
– Storage getting cheaper
 Data stored in many different ways
– Data resources
• Relational databases
• XML databases
• Flat files
 Need ways to facilitate
– Data discovery
– Data access
– Data integration
 Empower e-Business and e-Science
– The Grid is a vehicle for achieving this
http://www.ogsadai.org.uk
3
Goals for OGSA-DAI
 Aim to deliver application mechanisms that:
– Meet the data requirements of Grid applications
• Functionally, performance and reliability
• Reduce development cost of data centric Grid applications
• Provide consistent interfaces to data resources
– Acceptable and supportable by database providers
• Trustable, imposed demand is acceptable, etc.
• Provide a standard framework that satisfies standard requirements
 A base for developing higher-level services
–
–
–
–
Data federation
Distributed query processing
Data mining
Data visualisation
http://www.ogsadai.org.uk
4
Integration Scenario
A patient moves hospital
Data A
Data B
Amalgamated patient record
Data C
DB2
Oracle
A: (PID, name, address, DOB)
B: (PID, first_contact)
CSV
file
C: (PID, first_name, last_name,
address, first_contact, DOB)
http://www.ogsadai.org.uk
5
Why OGSA-DAI?
 Why use OGSA-DAI over JDBC?
– Language independence at the client end
• Do not need to use Java
– Platform independence
• Do not have to worry about connection technology and drivers
– Can handle XML and file resources
– Can embed additional functionality at the service end
• Transformations, Compression, Third party delivery
• Avoiding unnecessary data movement
– Provision of Metadata is powerful
– Usefulness of the Registry for service discovery
• Dynamic service binding process
– The quickest way to make data accessible on the Grid
• Installation and configuration of OGSA-DAI is fast and
straightforward
http://www.ogsadai.org.uk
6
Project Partners
Powered by ….
Funded by the Grid Core Programme
OGSA-DAI
£3 million, 18 months, from Feb 2002
Three major releases, three interim
releases
DAIT (DAI-Two)
Keep the OGSA-DAI brand name
£1.5 million, 24 months,
from Oct 2003
Four major releases
GGF DAIS WG
Strong involvement.
Standardise the interfaces
OGSA-DAI to be a reference
implementation
http://www.ogsadai.org.uk
7
Core features
 An extensible framework for building applications
– Supports relational, xml and some files
• MySQL, Oracle, DB2, SQL Server, Postgres, XIndice, CSV, EMBL
– Supports various delivery options
• SOAP, FTP, GridFTP, HTTP, files, email, inter-service
– Supports various transforms
• XSLT, ZIP, GZip
– Supports message level security using X509 certificates
– Client Toolkit library for application developers
– Comprehensive documentation and tutorials
 Third production release is coming in November
– OGSI/GT3 based
– Also previews of WS-I and WS-RF/GT4 releases
http://www.ogsadai.org.uk
8
Activities are the drivers
Express a task to be performed by a GDS
Three broad classes of activities:
– Statement
– Transformations
– Delivery
Extensible:
– Easy to add new functionality
– Does not require modification to the service interface
– Extension operate within the OGSA-DAI framework
Functionality:
– Implemented at the service
– Work where the data is (do not require to move data back)
http://www.ogsadai.org.uk
9
Client Toolkit
Why? Nobody wants to write XML!
A programming API which makes writing
applications easier
– Now: Java
– Next: Perl, C, C#?, ML!?
// Create a query
SQLQuery query = new SQLQuery(SQLQueryString);
ActivityRequest request = new ActivityRequest();
request.addActivity(query);
// Perform the query
Response response = gds.perform(request);
// Display the result
ResultSet rs = query.getResultSet();
displayResultSet(rs, 1);
http://www.ogsadai.org.uk
11
e-Digital MammOgraphy National Database
Built a prototype of a national database of
mammographic images in support of the UK
Breast screening programme
Employ Grid technologies to facilitate this
process
http://www.ogsadai.org.uk
13
CHU
Data Training
Load
App
Core &
Training API
KCL
Data Training
Load
App
Data Training
Load
App
Core &
Training API
Core
Services
Core
Services
OGSA-DAI
OGSA-DAI
UED
UCL
Core &
Training API
Core
Services
OGSA-DAI
Data Training
Load
App
Core &
Training API
Core
Services
Content
Manager
DB2
Content
Manager
DB2
Core Training
API
API
Training
Services
OGSA-DAI
OGSA-DAI
DB2 Federation
DB2
Training
Application
OGSA-DAI
Content
Manager
DB2
Content
Manager
http://www.ogsadai.org.uk
Database Files
14
GeneGrid
Grid Based Framework for Bioinformatics –
Virtual Bioinformatics Laboratory
–
–
–
–
Integration of Existing Technologies & Data Sets
Gene Study in Silico
Develop Specialist Data Sets
Grid Services for Commercial or 3rd Party Use
Data resources as XML collections (XIndice),
flat files and relational databases (MySQL)
– OGSA-DAI plus custom extensions
– Beta testers for file based activities
 http://www.qub.ac.uk/escience/projects/genegrid/
http://www.ogsadai.org.uk
16
Distributed Query Processing
 Queries mapped to
algebraic expressions for
evaluation
 Parallelism represented by
partitioning queries
3,4
op_call
(Blast)
exchange
hash_join
(proteinId)
– Use exchange operators
 Prototype available from:
– http://www.ogsadai.org.uk
reduce
reduce
exchange
reduce
1
table_scan
(protein)
http://www.ogsadai.org.uk
2
table_scan
termID=S92
(proteinTerm)
18
GridMiner
 Test application area: medical
– traumatic brain injury treatment
– Predicting the outcome of seriously ill patients
– analytical part focuses on data mining and On-Line
Analytical Processing (OLAP)
 Target:
– provide tools to discover and access relevant knowledge
and information from different distributed and
heterogeneous data sources
– building on and extending OGSA-DAI
 http://www.gridminer.org/
http://www.ogsadai.org.uk
19
GridMiner Scenario
 Heterogeneities:
– Name in A is „First Last“ (as the target format)
– Name in C has to be combined
 Distribution:
– 3 data sources
http://www.ogsadai.org.uk
20
Future work
 Architecture review
– better concurrency model
– better AAA framework
– better definition of extensibility points
• security, activities, dynamic configuration, mobile code,…
 Improved support for
–
–
–
–
–
WS Security profiles
Stored procedures
Data transport
XQuery
Database specific datatypes and SQL
 Additionally
– JDBC and ODBC driver for OGSA-DAI
– Contribution process
http://www.ogsadai.org.uk
21
Further information
The OGSA-DAI Project Site:
– http://www.ogsadai.org.uk
The DAIS-WG site:
– http://forge.gridforum.org/projects/dais-wg/
OGSA-DAI Users Mailing list
– [email protected]
– General discussion on grid DAI matters
Formal support for OGSA-DAI releases
– http://www.ogsadai.org.uk/support
– [email protected]
OGSA-DAI training courses
http://www.ogsadai.org.uk
22
Project Membership
Malcolm
Kostas
Norman
Paul
Principal Investigators
Research Team
Programme Management
Board Chair
Neil
Technical Review Board
Chair
Charaka
Mike
Ally
Mario
Project Manager
Amy
Tom
EPCC Team
Andy
Simon
Dave
IBM Development Team
http://www.ogsadai.org.uk
Neil
Patrick
IBM Dissemination Team
23
The End
Questions?
http://www.ogsadai.org.uk
24