OGSA-DAI-3-Introduct.. - Center for Computation & Technology

Download Report

Transcript OGSA-DAI-3-Introduct.. - Center for Computation & Technology

Introduction to OGSA-DAI
The OGSA-DAI Team
[email protected]
http://www.ogsadai.org.uk
The OGSA-DAI Project
A generic framework for integrating data
access and computation
– Uniform interface to relational, XML, flat file data resources
Using the grid to take specific classes of
computation nearer to the data
Kit of parts for building tailored access and
integration applications
Investigations to inform DAIS-WG
One reference implementation for DAIS
Releases publicly available NOW
http://www.ogsadai.org.uk
2
Project Partners
Powered by ….
Funded by the
Grid Core Programme
http://www.ogsadai.org.uk
3
Project Membership
Malcolm
Kostas
Norman
Paul
Principal Investigators
Research Team
Programme Management
Board Chair
Neil
Technical Review Board
Chair
Charaka
Mike
Ally
Mario
Project Manager
Amy
Charaka
EPCC Team
Andy
Simon
Dave
IBM Development Team
http://www.ogsadai.org.uk
Brian
Neil
Patrick
IBM Dissemination Team
4
Project Status
Current release 4.0
– Globus Toolkit 3.2 compliant
– Platform and language independent
• Java 1.4
• Document model
Work concentrated on data access
– Wraps data resources without hiding underlying data
model
– Provide base for higher-level services
• Distributed Query Processing (DQP)
• Data federation services
http://www.ogsadai.org.uk
6
Supported Data Resources
Relational
MySQL
DB2
Oracle
PostgreSQL
SQLServer
XML





Xindice
eXist
Other

?
http://www.ogsadai.org.uk
Files

7
Web Service Architecture
Service
Registry
Service
Consumer
Bind
Service
Provider
http://www.ogsadai.org.uk
8
OGSA-DAI Service
Architecture
DAISGR
Service
Consumer
Bind
http://www.ogsadai.org.uk
GDSF
GDS
9
OGSA-DAI Services
OGSA-DAI uses three main service types
– DAISGR (registry) for discovery
– GDSF (factory) to represent a data resource
– GDS (data service) to access a data resource
DAISGR
locates
This will change
GDSF
creates
GDS
Data
Resource
http://www.ogsadai.org.uk
10
GDSF and GDS
 Grid Data Service Factory (GDSF)
– Represents a data resource
– Persistent service
• Currently static (no dynamic GDSFs)
– Cannot instantiate new services to represent other/new
databases
– Exposes capabilities and metadata
– May register with a DAISGR
 Grid Data Service (GDS)
–
–
–
–
Created by a GDSF
Generally transient service
Required to access data resource
Holds the client session
http://www.ogsadai.org.uk
11
DAISGR
DAI Service Group Registry (DAISGR)
–
–
–
–
Persistent service
Based on OGSI ServiceGroups
GDSFs may register with DAISGR
Clients access DAISGR to discover
• Resources
• Services (may need specific capabilities)
– Support a given portType or activity
http://www.ogsadai.org.uk
13
Location
Registry
DAISGR
findServiceData
registerService
Factory
Analyst
findServiceData GDSF
 Data resource publication through registry
 Data location hidden by factory
 Data resource meta data available through
Service Data Elements
http://www.ogsadai.org.uk
14
Interaction Model: Start up
OGSI Container
DAISGR
1. Start OGSI containers with
persistent services.
2. Here GDSF represents Frog
database.
OGSI Container
GDSF
http://www.ogsadai.org.uk
15
Interaction Model:
Registration
OGSI Container
DAISGR
3. GDSF registers with DAISGR.
Frogs: GSH
OGSI Container
GDSF
http://www.ogsadai.org.uk
16
Interaction Model:
Discovery
OGSI Container
DAISGR
Frogs: GSH
4. Client wants to know about
frogs. Can:
(i) Query the GDSF directly
if known or
(ii) Identify suitable GDSF
through DAISGR.
OGSI Container
GDSF
Mmmmm
…
Frogs?
http://www.ogsadai.org.uk
17
Interaction Model: Service
Creation
OGSI Container
DAISGR
Frogs: GSH
5. Having identified a suitable
GDSF client asks a GDS to be
created.
OGSI Container
GDSF
GDS
http://www.ogsadai.org.uk
18
Interaction Model: Perform
OGSI Container
DAISGR
Frogs: GSH
6. Client interacts with GDS by
sending Perform documents.
7. GDS responds with a
Response
document.
8. Client may terminate GDS
when finished or let it die
naturally.
OGSI Container
GDSF
GDS
http://www.ogsadai.org.uk
19
Interaction Model: Summary
Only described an access use case
– Client not concerned with connection mechanism
– Similar framework could accommodate service-service
interactions
Discovery aspect is important
– Probably requires a human
– Needs adequate definition of metadata
• Definitions of ontologies and vocabularies - not something
that OGSA-DAI is doing …
http://www.ogsadai.org.uk
20
More Complex Behaviour
Deliver data back to the client.
Container
Client
GDS
GDT
Data Resource
Container
GDT
Deliver data
another GDS.
GDS
Data Resource
Data Resource
And there's a lot more that you can do …
http://www.ogsadai.org.uk
21
Usage Patterns
Retrieve
Update/Insert
Pipeline
Data Flow
Q
Q+U
G
A
Q1
S+R
S
Q+D
I
P
G
S1
U/R
A
C
G
P
Q2+D
Q+D
G2=C
S2
A
R
- OGSI process
- Non-OGSI process
- Analyst
- Consumer
- GDS
- Producer
Q1+D
S
G1=P
Q
U
A
G
S1
P
S
Q
I
D
U/R
GA
Q2
A
C
I
G
S
C
Actors
A
U
A
G1=P
G
A
Call
Response
R
S
Data
Q
D
S
R
U
I
- Query
- Delivery
- Status
- Result
- Update
- Data id
G2=C
S2
http://www.ogsadai.org.uk
22
Project Using OGSA-DAI
http://www.ogsadai.org.uk
23
Projects Using OGSA-DAI
Bridges
N2Grid
(http://www.brc.dcs.gla.ac.uk/projects/bridges/)
(http://www.cs.univie.ac.at/institute/index.html?project-80=80)
BioSimGrid
AstroGrid
(http://www.biosimgrid.org/)
(http://www.astrogrid.org/)
BioGrid
GEON
(http://www.biogrid.jp/)
(http://www.geongrid.org/)
OGSA-DAI
eDiaMoND
(http://www.ogsadai.org.uk)
(http://www.ediamond.ox.ac.uk/)
OGSA-WebDB
(http://www.gtrc.aist.go.jp/dbgrid/)
GeneGrid
FirstDig
(http://www.qub.ac.uk/escience/projects.php#genegrid)
(http://www.epcc.ed.ac.uk/~firstdig/)
myGrid
INWA
(http://www.mygrid.org.uk/)
(http://www.epcc.ed.ac.uk/)
ODD-Genes
IU RGRBench
(http://www.epcc.ed.ac.uk/oddgenes/)
(http://www.cs.indiana.edu/~plale/projects/RGR/OGSA-DAI.html)
http://www.ogsadai.org.uk
24
Project classification
• Bridges
• BioGrid
• ODD-Genes
• AstroGrid
• BioSimGrid
Physical
Sciences
• GEON
• eDiamond
Biological
Sciences
• myGrid
• GeneGrid
OGSA-DAI
• MCS
• N2Grid
• OGSA Web-DB
• GridMiner
• IU RGBench
• FirstDig
• INWA
Commercial
Applications
http://www.ogsadai.org.uk
Computer
Sciences
25
Points to Note
Feedback from users largely positive
–
–
–
–
–
Good suggestions
Fair criticisms
How OGSA-DAI is being used
Where it succeeds and where it fails
Helping us to capture requirements
Hope to allow user contributions
– Plan to establish a policy/framework for this
Engage more with User Community
– Meetings scheduled for this year
• OGSA-DAI mini-workshop at AHM 2004
• OGSA-DAI tutorials at various meetings/locations
http://www.ogsadai.org.uk
26
 e-Digital MammOgraphy
National Database
– Mammogram - X-ray of the breast
 Built prototype of a national
database of mammographic
images
– In support of the UK Breast screening
programme
 Employed Grid technologies to
facilitate process
Thanks to eDiaMonND project and the
Digital Database for Screening Mammography
for this image.
http://www.ogsadai.org.uk
27
 Breast screening in the UK began in 1988
– Women aged 50-64 screened every 3 Years
– Women aged 50-70 from 2004
– 1 View/Breast → 2 views by 2003
 UK has
– Over 90 Breast screening units throughout the UK
– Each one deals with about 45000 women on average p.a.
 Each centre sees 5000-20000 images/year
 In 2001-02 → 2002-03
–
–
–
–
Screened: 1.4M → 1.5M
Recalled for Assessment : 77911 → 79441
Cancers detected : 10003 → 10467
Lives per year Saved: 300 → 1250 (by 2010)
 Distributed team of doctors perform the analysis
http://www.ogsadai.org.uk
28
CHU
Data Training
Load
App
Core &
Training API
KCL
Data Training
Load
App
Data Training
Load
App
Core &
Training API
Core
Services
Core
Services
OGSA-DAI
OGSA-DAI
UED
UCL
Core &
Training API
Core
Services
OGSA-DAI
Data Training
Load
App
Core &
Training API
Core
Services
Content
Manager
DB2
Content
Manager
DB2
Core Training
API
API
Training
Services
OGSA-DAI
OGSA-DAI
DB2 Federation
DB2
Training
Application
OGSA-DAI
Content
Manager
DB2
Content
Manager
http://www.ogsadai.org.uk
Database Files
29
eDiaMoND Findings:
–
–
–
–
–
OGSA-DAI provides a flexible framework
Dynamically configure the system through discovery
Activities can operate with different levels of granularity
Federation can be introduced at various levels
Good documentation on how to extend the framework
• Extended Activities to access IBM DB2 Content Manager
– Changes between versions broke some things
• Low level XML issues
http://www.ogsadai.org.uk
30
FirstDIG
 Data mining with the First Transport Group, UK
– Example: “When buses are more than 10 minutes late there is an
82% chance that revenue drops by at least 10%”
– "The results of this exercise will revolutionise the way we do
things in the bus industry.“, Darren Unwin, Divisional Manager,
First South Yorkshire.
OGSA-DAI
OGSA-DAI
OGSA-DAI
OGSA-DAI
OGSA-DAI Client Application
Data Mining Application
http://www.ogsadai.org.uk
31
INWA
 Innovation Node: Western Australia
–
Informing Business & Regional Policy:
Grid-enabled fusion of global data and local
knowledge
 Project
–
–
Run from Nov 2003 - Aug 2004
Involved 10 partners (6 UK + 4 Australia)
 Aim
–
–
–
–
Data mine commercially sensitive data
Security an absolute MUST
Employ Grid technologies
Need access to data and computational resources
 Demonstrator using:
–
OGSA-DAI
•
–
Incorporate data resources
Sun DCG's TOG (Transfer-queue Over Globus)
•
Handle job submission to analyse micro array data
http://www.ogsadai.org.uk
32
INWA
EPCC,UK
TOG
Grid Engine
Bank
Telco
OGSA-DAI
Bank data
OGSA-DAI
UK Property
Data Browser
user@australia
Curtin,Australia
TOG
Grid Engine
user@edinburgh
Bank
Telco
OGSA-DAI
Telco data
OGSA-DAI
Australian property
Data Browser
http://www.ogsadai.org.uk
33
INWA: Lessons Learned
Performing Data Integration:
– TimeZone date problems
Security issues:
– Bugs in
• JavaCoG in GT3
• OGSA-DAI could not switch security for Grid data transfers
• TOG had no security option
– All of these have been fixed
Middleware not mature enough for
commercial deployment
http://www.ogsadai.org.uk
34
Why OGSA-DAI?
 Why use OGSA-DAI over JDBC?
– Can embed additional functionality at the service end
• Transformations, compressions
• Third party delivery
• The extensible activity framework
– Avoiding unnecessary data movement
– Common interface to heterogeneous data resources
• Relational, XML databases, and files
– Usefulness of the Registry for service discovery
• Dynamic service binding process
• Provision of good meta-data is necessary
– Language independence at the client end
• Do not need to use Java
– Platform independence
• Do not have to worry about connection technology, drivers, etc
http://www.ogsadai.org.uk
35