The OGSA-DAI Project Databases and the Grid

Download Report

Transcript The OGSA-DAI Project Databases and the Grid

The OGSA-DAI Project
Databases and the Grid
Neil Chue Hong
Project Manager
EPCC, Edinburgh
[email protected]
http://www.ogsadai.org.uk
What is OGSA-DAI?
It is a project:
– OGSA Data Access and Integration: funded by the UK
eScience Grid Core Programme
It is a vision:
– From simple database access to truly virtualised data
resources
It is a standard:
– The GridDataService Specification from the Data Access and
Integration Working Group (DAIS-WG) of the Global Grid
Forum (GGF)
It is software that you can use:
– Current version is R2.5
http://www.ogsadai.org.uk
OGSA-DAI Objective
To define:
–
–
–
–
–
open standards and
open source based
uniform service interfaces
for accessing heterogeneous data sources
within the Open Grid Services Architecture (OGSA) framework
Why?
– Because we are increasingly wanting to integrate different data
sources from different organisations together
– The Grid, and OGSA, appears to provide a framework for
producing software to do this
http://www.ogsadai.org.uk
Who are we?
Contributing to the global
grid computing community
IBM
USA
EPCC & NeSC
Glasgow
Newcastle
Belfast
Manchester
Daresbury Lab
EPCC & NeSC
IBM UK
IBM USA
Manchester e-SC
Newcastle e-SC
Oracle
373 man months
Oxford
Cardiff
IBM Hursley
RAL
Cambridge
Oracle
Hinxton
London
Southampton
£3 million, 18 months, started February 2002
Funded by the Grid Core Programme
http://www.ogsadai.org.uk
What are we doing?
Data Intensive Applications
Scientific Data Mining & Integration Technology
Monitoring
Diagnosis
Scheduling
Accounting
Logging
Grid Plumbing & Security Infrastructure
Data & Storage Resources
Distributed
http://www.ogsadai.org.uk
What are we doing?
Data Intensive Applications
Scientific Data Mining & Integration Technology
Monitoring
Diagnosis
Logging
Data Integration
Scheduling
Accounting
Authorisation
Data Access
Grid Plumbing & Security Infrastructure
Data & Storage Resources
Structured Data
Distributed
http://www.ogsadai.org.uk
What are we doing?
Data Intensive Applications
App. Developers
Scientific Data Mining & Integration Technology
Monitoring
Diagnosis
Logging
Data Integration
Scheduling
Accounting
Authorisation
Data Access
Operations
Grid Plumbing & Security Infrastructure
Team
Owners Data & Storage Resources
Structured Data
Distributed
http://www.ogsadai.org.uk
What are we doing?
Data Intensive Application Scientists
Data Intensive Applications
App. Developers
Scientific Data Mining & Integration Technology
Tech. Developers
Monitoring
Diagnosis
Logging
Data Integration
Scheduling
Accounting
Authorisation
Data Access
Operations
Grid Plumbing & Security Infrastructure
Team
Owners Data & Storage Resources
Distributed
Structured
DataData
Providers
Data Curators
http://www.ogsadai.org.uk
DAIS WG
GridDatabaseService Specification
–
–
–
–
–
DAIS WG of the GGF
Aim to produce a V1.0 specification by early 2004
Defines an interface for a GridDatabaseService
May contributors, not just OGSA-DAI Project
OGSA-DAI (the software) seeks to be a reference
implementation of this standard
• But does not necessarily track it exactly just now
– Requirements and Overview Informational documents also
published
http://www.ogsadai.org.uk
The OGSA-DAI Approach
Reuse existing technologies and standards
– OGSA, Query languages, Java, transport
Three key services:
– GridDataService
– GridDataServiceFactory
– DAIServiceGroupRegistry
Benefits:
–
–
–
–
–
Location independence
Hides heterogeneity
Scalable
Flexible
Dynamic
http://www.ogsadai.org.uk
OGSA-DAI Positioning - Today
OGSA-DAI Distributed Query
OGSA-DAI Basic Services
Delivery
Data
Format
Query
GDS
(Create
Retrieve
Drivers
Update
Delete)
OGSA
GDSF
Meta
Data
Notification
Lifetime Location
Database, Communication, OS… Technology
http://www.ogsadai.org.uk
DAISGR
OGSA-DAI To Date
Assuming that OGSA becomes the standard
framework
– Have adopted the OGSA approach
Have first concentrated on data access
– Released software has only limited data integration so far
– Distributed query processor prototype due in July
Implementation provides focus on basic
functionality first
– But architecturally we have tried to answer many pertinent
questions
– Functionality will increase over subsequent releases
http://www.ogsadai.org.uk
GDS in action
1a. Request to
Registry for
sources of data
about “x”
Analyst
3c. Results of
query returned to
client as XML
SOAP/HTTP
Registry
DAISGR
service creation
1b. Registry
responds
2a. Request to Factory for access to
with
database
Factory
handle
2c. Factory returns
handle of GDS to client
Factory
GDSF
2b. Factory creates GridDataService
to manage access
3a. Client queries GDS with
SQL, XPath, XQuery etc
OR
3d. Results of query delivered to
consumer as XML
Grid Data
Service
GDS
3b. GDS interacts with
database
Consumer
API interactions
http://www.ogsadai.org.uk
Database
(Xindice
MySQL
Oracle
DB2)
Activities
OGSA-DAI is structured around the concept of
activities
This framework allows new functionality to be
added easily
Three types of activity at present:
– statement (e.g. SQLQuery, Xupdate)
– transformation (e.g. XSL translation, compression)
– delivery (e.g. GridFTP)
OGSA-DAI provides implementations of
common functionality, others can extend
http://www.ogsadai.org.uk
Documents
Accessing a Grid Data Resource is done using
Documents
<gridDataServicePerform>
– caveat: this may change
A document allows you to:
– define parameters
– execute activities
– deliver results
<request name=“myRequest”>
<parameter name=“idname”>
<value name=“idvalue”>10</value>
</parameter>
<sqlQueryStatement name=“myStatement”>
<sqlParameter position=“1” from=“idvalue”/>
<expression>
SELECT * FROM littleblackbook WHERE id=?
</expression>
<webRowSetStream name=“statementresult”/>
</sqlQueryStatement>
Written in XML,
<deliverToResponse name=“d1”>
normally used by a client.
<fromLocal from=“statementresult”/>
</deliverToResponse>
</request>
</gridDataServicePerform>
http://www.ogsadai.org.uk
OGSA-DAI Core Services
OGSA-DAI Release 2.5 – out now
– Java, Tomcat, Globus Toolkit 3 Beta
– Supports MySQL, DB2, Xindice; SQL92, XPath, Xupdate
OGSA-DAI Release 3 – end July
– Java, Tomcat, Globus Toolkit 3.0
– Supports MySQL, DB2, Oracle, Xindice; SQL92, XPath,
Xupdate
– Adds Notification, Internationalisation, Transactions, Caching
Continue to track Globus Toolkit 3 releases
– Experimental, then production, GT3 grids will help
http://www.ogsadai.org.uk
Asynchronous Delivery
Asynchronous delivery – Pull
Q
Client
GDS
Instance
GDS
1
Rs
2
DT
DB
D + GDH
GSH/R +
data id
3
Ra
Consumer
GDT
Asynchronous delivery – Push
Q + D + GSH/R
Client
GDS
1
Rs
2
DT
GDT
Consumer
GDS
Instance
GSH/R
3
Ra
http://www.ogsadai.org.uk
DB
GDS Composition
1
2
3
Client
Client
Client
GDS
Operation
GDS
Operation
GDS
Operation
GDS
Operation
GDS
DB
4
GDS
DB
5
Operation
Operation
Client
DB
Client
GDS
Operation
GDS
Operation
GDS
Operation
GDS
Operation
GDS
Operation
GDS
Operation
Operation
http://www.ogsadai.org.uk
DB
DB
Distributed Query Service
A higher level service:
– Extension of Polar* query processor, partitions and schedules
queries
– Sits on top of OGSA and OGSA-DAI
Defines new portTypes and services
– GridDistributedQuery(GDQ) PortType
– GridDistributedQueryService(GDQS) – wraps Polar*
– GridQueryEvaluatorService(GQES) – perform subqueries
Currently based on OGSA-DAI Release 1.5
http://www.ogsadai.org.uk
DQS Architecture
http://www.ogsadai.org.uk
DQP in action
http://www.ogsadai.org.uk
DQS: the future
 The GridDistributedQueryService
– is an example of a higher level data integration service which utilises
OGSA-DAI core services
– Assumes that GDSF, GDQS Factory and client live in different
containers
– Really requires a well-defined meta-model for the physical schema of
a database
• Being partially addressed in DAIS WG
– Shows how a GDS can be both client and service
• Service hierarchy and composition
 DAIT (proposed follow-on to OGSA-DAI) would
produce a robust reference implementation of the DQP
components
http://www.ogsadai.org.uk
Projects using OGSA-DAI
 Industry:
– FirstDIG: business process analysis (with First Transport Group)
• OGSA-DAI with datamining
 Collaborative
– Bridges: database integration over six geographically distributed
genomics research sites (with IBM UK)
• OGSA-DAI with DiscoveryLink
– eDIKT: porting OGSA-DAI to other platforms
• OGSA-DAI with performance
– DEISA: linking Europe’s HPC centres
• OGSA-DAI with distributed accounting
– MS .Net Grid: porting OGSA-DAI to the .Net framework (with
Microsoft Research UK)
• OGSA-DAI with .Net
http://www.ogsadai.org.uk
ODD Genes
OGSA-DAI used to query gene expression
data resources at GTI and HGU
– One data resource: low spatial resolution, high gene resolution
– Other resource: high spatial resolution, low gene resolution
– Query one database and use data to find correct data resource
to run more detailed query and produce visualisation
– Simple example of data integration at work
Client
GDS
Query
GTI
GDS
Query
HGU
EPCC
Render
http://www.ogsadai.org.uk
Project Timeline
today
WS + GSI UK support ( > 100 downloads)
XML + OGSA Prototypes for Early Adopters
Design Documents & Demos for DAIS WG @ GGF5
XML + OGSA Prototype Available
RDB + GT2 / OGSA Prototypes Available
GGF6 WG Papers & Prototypes
Early Adopters Workshop @ NeSC
Ship Release 1 (Jan 15th 2003)
OGSADAI Tutorial @ NeSC
Release 1.5 (Feb 28th 2003)
Tutorial @ GGF7
Release 2
Tutorial @ NeSC
Release 2.5
Release 3
Feb ’02
May ’02
Jul ’02
Sep ’02
Dec ’02
TP4
Phase 1 Starts
TP5 GT3 A1
Phase 2 Starts
http://www.ogsadai.org.uk
Feb ’03
May ’03
GT3 A3
GT3 A2
GT3 A4
GT3 Beta
GT3 Final
Sep ’03
A DAIT for the Future
DAIT (Data Access and Integration Two)
–
–
–
–
follow on project from OGSA-DAI, funded for two years
continue to research, prototype and productise
release every six months, R4 in December 2003
R4:
•
•
•
•
•
support for SQL Server and structured filesystems
extended DBMS management functionality (e.g. archive)
bulk load operations (where supported)
support for DFDL file access
triggers exposed through notification
– R5
• Distributed Query Processing, Distributed Transactions
• Virtualised views across databases
http://www.ogsadai.org.uk
Further information
The OGSA-DAI Project Site:
– http://www.ogsadai.org.uk
The DAIS-WG site:
– http://cs.man.ac.uk/grid-db
OGSA-DAI Users Mailing list
– [email protected]
– General discussion on grid data access and integration
Formal support for OGSA-DAI releases
– http://www.ogsadai.org.uk/support + [email protected]
OGSA-DAI training courses
– http://www.ogsadai.org.uk/courses/
http://www.ogsadai.org.uk