20080117-OGSADAI

Download Report

Transcript 20080117-OGSADAI

Grids, Grid Data Services and OGSA-DAI
Mike Mineter
NeSC-TOE
[email protected]
Acknowledgement
• Many slides from OGSA-DAI team.
• (Some slides from me.)
2
EU project: RIO31844-OMII-EUROPE
Contents
•
•
•
•
•
What is a Grid?
What is a Grid Data Service?
Why the “OGSA-DAI” acronym?!
Why does OGSA-DAI matter?!
When should we use OGSA-DAI?
3
EU project: RIO31844-OMII-EUROPE
What is a Grid? - 1
• This is
4
EU project: RIO31844-OMII-EUROPE
What is a Grid?
Computers
Data
People
A Grid is all about the sharing of Resources
5
EU project: RIO31844-OMII-EUROPE
A Grid is..
•
… all about the sharing of Resources
– Within and between virtual organisations (= collaborations)
• Resources accessed by abstractions
– User wants a job to run, wants to access data,…
» Rarely cares where this happens
•
… a set of resources (and enabling services) that share mechanisms for
– Authentication: communicate identity of user/provider
• X.509 certificate commonly used in “production grids”
– Authorisation: what can this user be allowed to do
• Member of which VO, which group,…
– Underpinned by agreement across VOs and resource providers
•
… infrastructure that builds on the Internet to permit orchestration of services
across administrative domains
6
EU project: RIO31844-OMII-EUROPE
Web services – software components
that are…
• Accessible across a
network
• Loosely coupled, defined
by the messages they
receive / send
• Service description that
can be used to create
client software
• Based on standards (for
which tools do / could
exist)
• Developed in anticipation
of new uses
Client
Service
Service
Service
Service
Service
Service
7
EU project: RIO31844-OMII-EUROPE
26
Globus Toolkit 4 Web Services Core
Custom
Web
Services
Custom
GT4
WSRF Web WSRF Web
Services
Services
WS-Addressing, WSRF,
WS-Notification
WSDL, SOAP, WS-Security
Thanks to J. Schopf, ANL
Registry
Administration
GT4 Container
User Applications
Focus on Data
Data
OGSA-DAI enables the sharing of Data Resources
29
EU project: RIO31844-OMII-EUROPE
Types of data services
•
Many user communities
manage data in grid
vaults (aka storage
elements)
– Experimental data
• ….
– Replicated for
resilience
– And to be close to
where computation
will happen
• Many new user communities
have more diverse data
resources
• To facilitate new research
need data to be accessible
from Grid infrastructures
• Resources:
– May pre-date Grids
– Providers may have current
ways to distribute data to
users
– May not be able to replicate
data
– Need AuthN and AuthZ
30
EU project: RIO31844-OMII-EUROPE
Motivation
• Grid is about sharing resources
• OGSA-DAI is about sharing structured data resources
Relational
Database
XML
Database
Indexed
File
Web: www.omii.ac.uk
31
Email: [email protected]
Life before OGSA-DAI….
• A few examples follow of alternative approaches
to sharing data.
Web: www.omii.ac.uk
32
Email: [email protected]
Sharing data via web site download
• ZIP up data and put it on a web site
• Pros
o
o
Easy distribution for providers
Easy access for consumers
• Cons
o
o
o
o
Consumers have to download all the data
Consumers have to load data into local databases to
use it
Static snapshot
Security
Web: www.omii.ac.uk
33
Email: [email protected]
Sharing data via direct access
• Providers tell consumers
o
o
o
Database URL – mycomputer.epcc.ed.ac.uk:3306
Username – userID
Password – password
• Pros
o
Consumers have direct access
• Cons
o
o
o
o
Firewall issues
User and password management is hard
No consistent security model
Hard to use in grid/web service workflows
Web: www.omii.ac.uk
34
Email: [email protected]
Sharing data via direct access
• Cons (continued)
o
o
o
No server-side layer in which to standardize database
heterogeneities
Myriad drivers
Different APIs across different data types
• Relational and JDBC
• XML and XMLDB
• Indexed files and Lucene
Web: www.omii.ac.uk
35
Email: [email protected]
Domain-specific web services
• Manipulate data using domain-specific
operations, e.g.
o
o
o
Book findByISBN(ISBN)
List<Book> findByAuthor(Author)
List<Book> findByKeyword(Word)
• Pros
o
o
o
o
Fits with grid/web service approach
Abstraction hides back-end database details
Web services are programming language neutral
Operations likely to map well to authorization policies
Web: www.omii.ac.uk
36
Email: [email protected]
Domain-specific web services
• Cons
o
Slower than direct access
• Web service layer
• SOAP transport overhead – especially for large result sets
o
Domain-specific API prevents use of generic data exploration,
mining and manipulation tools
Books
Cancer
Generic Data Linking
Application
Books written
by University
employees
Web: www.omii.ac.uk
University
Employees
37
University
employees in
1932 who have
since died of
cancer
Email: [email protected]
OGSA-DAI generic web services
• Manipulate data using OGSA-DAI’s generic web services
Relational
Database
request
OGSA-DAI
XML
Database
data
Indexed
File
Web: www.omii.ac.uk
38
Email: [email protected]
Importance of workflows
OGSA-DAI server
is close to data
Access
OGSA-DAI
OGSA-DAI
service
Transform
Web Service
Query ->
Transform ->
DeliverToFTP
FTP Server
3
activities
in the
workflow
FTP Server
39
EU project: RIO31844-OMII-EUROPE
Usage Scenarios
Data Source
Data Source 1
Data Source
OGSA-DAI
OGSA-DAI
Client
Client
Data Source 2
Data Source n
OGSA-DAI
FTP Server on
Client
Client
Data message
Control message
©
40
EU project: RIO31844-OMII-EUROPE
OGSA-DAI 3.0
OMII
GT
Axis
UNICORE
WS-DAI
?
gLite
Embedded
Resource management
OGSA-DAI Core
Data Resources
Activity management
Workflow engine
Activities
Persistence and Configuration
41
EU project: RIO31844-OMII-EUROPE
Typical roles
• Researcher
– Wants to use data from context of known application, easy
portal, workflow..
• Data publisher
– Deploys OGSA-DAI server
– Determines AuthN and AuthZ policies for their data
– Establishes activities (= workflow components)
• Informatician / Application developer
– Deploys client software
– Uses Java to build workflow
– Exposes client for…
42
EU project: RIO31844-OMII-EUROPE
OGSA-DAI 3.0
•
•
•
OGSA-DAI has evolved constantly since February 2002
OGSA-DAI 2.2 released April 2006
As the number of users grew so did the requirements
–
–
–
–
•
•
More effective data streaming
Standardisation of activity inputs and outputs
Targeting multiple data resources in a single workflow
Supporting application-specific presentation layers
OGSA-DAI 2.2 was not suitable for addressing these
OGSA-DAI 3.0
– A complete re-design and re-implementation of OGSA-DAI
– A stable framework for the future
– Released September 2007
43
EU project: RIO31844-OMII-EUROPE
Where might OGSA-DAI not be suitable?
• OGSA-DAI is not
– A complete solution to every data-related problem
– A replacement for or competitor to JDBC
– Just about accessing relational databases
• It is not suitable if
– You have a single data resource that isn’t going to
change
– You have no data transformation requirements
– You want rapid access to data in a single data
resource
44
EU project: RIO31844-OMII-EUROPE
What is OGSA-DAI?
•
•
•
•
•
An extensible framework
accessed via web services
that executes data-centric workflows
involving heterogeneous data resources
for the purposes of data access, integration, transformation
and delivery
• within a grid
• and is intended as a toolkit for building higher-level
application-specific data services
45
EU project: RIO31844-OMII-EUROPE
Thank you!
http://www.ogsadai.org.uk
http://omii-europe.org
46
EU project: RIO31844-OMII-EUROPE