EPR - Pegasus Workflow Management System

Download Report

Transcript EPR - Pegasus Workflow Management System

Web and Grid Services
Slides taken from a variety of sources:
GT4 tutorial, by Borja Sotomayor
http://gdp.globus.org/gt4-tutorial/
International Summer school on Grid Computing 2007m
by Malcolm Atkinson http://www.iceageeu.org/issgc07/index.cfm
HP introduction to WSRF by Sanjay Dahiya
http://foss.in/slides/lb2004/wsrf-intro.ppt
USC Viterbi School of Engineering
•
•
•
•
•
Service Oriented Architecture
Web Services
WS-RF
Globus implementation of WS-RF
OGSA-DAI
USC Viterbi School of Engineering
Service Oriented Architectures:
Three Components
Registries
Register an available service
Send name & description
Service
Consumers
Services
USC Viterbi School of Engineering
Three Components
Registries
Request a service
Send a description
Service
Consumers
Services
USC Viterbi School of Engineering
Three Components
Registries
Set (possibly empty)
of matching
services
Service
Consumers
Services
USC Viterbi School of Engineering
Three Components
Registries
Service
Consumers
Request service operation
Services
USC Viterbi School of Engineering
Three Components
Registries
Service
Consumers
Return result or Error
Services
USC Viterbi School of Engineering
Composed behaviour
• Services are themselves consumers
– They may compose and wrap other services
• The registry is itself a consumer
• A federation of registries may deal with registry
services reliability & performance
• Observer services may report on quality of services
and help with diagnostics
• Agreements between services may be set up
– Service-Level Agreements
– Permitting sustained interaction
USC Viterbi School of Engineering
Web Services
USC Viterbi School of Engineering
Web Services
•
•
•
•
Enable communication between machines
Provide any number of functionality
Can be found and invoked
Self-describing—tell you how they can be invoked
USC Viterbi School of Engineering
Web Services
• Web Services are platform-independent and languageindependent, since they use standard XML languages
• Most Web Services use HTTP for transmitting messages
(such as the service request and response)
– Good for going through firewalls
• Issues:
– Overhead: XML is not as efficient for data transmission as using a
proprietary binary code. What you win in portability, you lose in
efficiency.
– Limited in functionality, no explicit state management or lifecycle
management
USC Viterbi School of Engineering
Typical Service Invocation
USC Viterbi School of Engineering
The Web Services architecture
USC Viterbi School of Engineering
A typical web service invocation
• Whenever a client needs to communicate with a service, it calls a client
stub
• The client stub will turn this 'local invocation' into a proper SOAP
request
• The SOAP request is sent over a network using the HTTP protocol. The
server receives the SOAP requests and hands it to the server stub which
invokes the service implementation
USC Viterbi School of Engineering
Web Server
• Web service: software that exposes
a set of operations
• SOAP engine: knows how to handle
SOAP requests and responses-Apache Axis
• Application server: provides a space
for applications that multiple clients
can access—a container--Jakarta
Tomcat server
• Http server: knows how to handle
HTTP messages
Sometimes a container is described as: SOAP engine + application server + HTTP server
USC Viterbi School of Engineering
Limitations of Web Services
• No explicit state management or lifecycle
management
• Web services are usually stateless
• Many services do
not need state
USC Viterbi School of Engineering
Adding state to services
• Need to allow clients to access appropriate state
USC Viterbi School of Engineering
“Stateless” vs. “Stateful” Services
FileTransfer
Service
move
move (A to B)
Client
• Without state, how does client:
–
–
–
–
Determine what happened (success/failure)?
Find out how many files completed?
Receive updates when interesting events arise?
Terminate a request?
• Few useful services are truly “stateless”, but WS interfaces
alone do not provide built-in support for state
USC Viterbi School of Engineering
Open Grid Services Architecture
USC Viterbi School of Engineering
Before OGSA
• Grid services before OGSA
–
–
–
–
Resource management (Globus GRAM)
Resource discovery (Globus MDS)
Data Management (Globus GridFTP, RLS)
Security
• All had
– Different mechanism for access
– Different mechanism for discovery
– Different mechanisms for management
USC Viterbi School of Engineering
OGSA
• Brings “order” to distributed services
• Promotes “open” standards: defined in GGF (now OGF),
OASIS
• Enables Virtualization
–
Encapsulation behind a common interface of diverse
implementations
• Allows the composition of lower-level services to form more
sophisticated services
• Defines common behaviors that all services must have:
–
–
–
–
Naming
Lifetime management
State management
Notification
USC Viterbi School of Engineering
USC Viterbi School of Engineering
WSRF
USC Viterbi School of Engineering
FileTransferService
(without WSRF)
FileTransfer
Service
move
move (A to B) : transferID
Client
whatHappen
state tellMeWhen
cancel
• Developer reinvents wheel for each new service
– Custom management and identification of state: transferID
– Custom operations to inspect state synchronously (whatHappen)
and asynchronously (tellMeWhen)
– Custom lifetime operation (cancel)
USC Viterbi School of Engineering
WSRF in a Nutshell
• Service
• State representation
Service
EPR
EPR
EPR
– Resource
– Resource Property
GetRP
GetMultRPs
Resource
SetRP
QueryRPs
RPs
Subscribe
SetTermTime
Destroy
• State identification
– Endpoint Reference
• State Interfaces
– GetRP, QueryRPs,
GetMultipleRPs, SetRP
• Lifetime Interfaces
– SetTerminationTime
– ImmediateDestruction
• Notification Interfaces
– Subscribe
– Notify
• ServiceGroups
USC Viterbi School of Engineering
FileTransferService (w/ WSRF)
FileTransferService
createResource
Transfer
getRP
RPs
queryRPs
createResource (A to B) : EPR
Client
destroy
• Developer specifies custom method to createResource and leaves the
rest to WSRF standards:
– State exposed as Resource + Resource Properties and identified by
Endpoint Reference (EPR)
– State inspected by standard interfaces (GetRP, QueryRPs)
– Lifetime management by standard interfaces (Destroy)
USC Viterbi School of Engineering
Grid Infrastructure: Open Standards
Applications of the framework
(Compute, network, storage provisioning,
job reservation & submission, data management,
application service QoS, …)
WS-Agreement
(Agreement negotiation)
WS Distributed Management
(Lifecycle, monitoring, …)
WS-Resource Framework & WS-Notification*
(Resource identity, lifetime, inspection, subscription, …)
Web services
(WSDL, SOAP, WS-Security, WS-ReliableMessaging, …)
USC Viterbi School of Engineering
WS-Resource Properties
• Each resource has a Resource Properties document.
• Resource Properties document is referred in
service portType in WSDL.
• Defined in XML schema.
• Each element in the Resource Properties document
is a Resource Property (RP).
• Resource properties can be queried using multiple
query dialects.
• Independent of back-end implementation.
USC Viterbi School of Engineering
Accessing Resource Properties
• Pull
– Client can query the RP document, using query engines.
• GetResourceProperty
• GetMultipleResourceProperties
• QueryResourceProperties
• Push
– Allows services to send changes in their resources’ RPs
to interested parties.
• WS-Notification
USC Viterbi School of Engineering
WS-Notification
• Subscriber indicates interest in a particular “Topic” by
issuing a “subscribe” request
• Broker (intermediary) permits
decoupling Publisher and Subscriber
• “Subscriptions” are WS-Resources
• Publisher need NOT be a Web Service
• Notification may be “triggered” by:
– WS Resource Property value changes
– Other “situations”
• Broker examines current subscriptions
• Brokers may
– “Transform” or “interpret” topics
– Federate to provide scalability
USC Viterbi School of Engineering
WS-Notification
• Characteristics of WS-Notification:
– Web services integration of traditional enterprise
publish/subscribe messaging patterns
• Composes with other Web services technologies
• Facilitates integration between different messaging middleware
environments
– Standardizes the role of Brokers, Publishers,
Subscribers and Consumers
– Provides two forms of publish/subscribe:
direct publishing and brokered publishing
– Standardizes Web service message exchanges for
publishing, subscribing and notification delivery
– Defines XML model of Topics and TopicSpaces to
categorize and organize notification messsages
USC Viterbi School of Engineering
WS-Resource Lifetime
• Creating new resources.
• Destroying old resources.
– Immediate destruction.
– Scheduled destruction using termination time.
• Soft-state lifetime management
– Lifetime extension.
• Example :
– jobs in a batch submission system could be represented as
resources
– submitting a new job causes a new resource to be created
– when the job is completed, the resource is destroyed.
USC Viterbi School of Engineering
WS-ServiceGroups
• Service groups maintain information about
collection of services or resources.
• ServiceGroupRegistration
– Add new members to group using WS invocation.
• Represent service groups as resources.
• MembershipContentRules
– Imposes restrictions on services that can become part of
service group like implementing an interface.
• WS-Notifications for service group changes.
• For example – Resource registries etc
USC Viterbi School of Engineering
WS-BaseFaults
• XML based fault transmission.
• Associated with an operation in WSDL.
• Includes standard datatypes for transmitting webservice
faults
– Originator, Timestamp etc..
Example :
<wsdl:portType name="pt">
<wsdl:operation name="op">
<!-- WSDL operation fault elements for each distinct fault -->
<wsdl:input … />
<wsdl:output … />
<wsdl:fault name=“aFault" message="tns:aFaultMessage"/>
<wsdl:fault name="BaseFault" message="wsbf:BaseFaultMessage"/>
</wsdl:operation>
</wsdl:portType>
USC Viterbi School of Engineering
Globus Toolkit implementation of
WSRF
USC Viterbi School of Engineering
Globus Toolkit:
Open Source Grid Infrastructure
Globus Toolkit v4
www.globus.org
Data
Replication
Credential
Mgmt
Replica
Location
Grid
Telecontrol
Protocol
Delegation
Data Access
& Integration
Community
Scheduling
Framework
WebMDS
Python
Runtime
Community
Authorization
Reliable
File
Transfer
Workspace
Management
Trigger
C
Runtime
Authentication
Authorization
GridFTP
Grid Resource
Allocation &
Management
Index
Java
Runtime
Security
Data
Mgmt
Execution
Mgmt
Info
Services
Common
Runtime
USC Viterbi School of Engineering
Tools for
building
WSRF
services
GT4 WS Core in a Nutshell
Service
EPR
EPR
EPR
GetRP
GetMultRPs
Resource
SetRP
QueryRPs
RPs
Implementation of WSRF:
Resources,
EndpointReferences,
ResourceProperties
Operation Providers: pre-build
implementations of WSRF
operations
Subscribe
Notification implementation:
Topics, TopicSet, Embedded
Notification Consumer service
SetTermTime
Destroy
USC Viterbi School of Engineering
GT4 WS Core in a Nutshell
Service Container
Service
Service
Service
GetRP
GetRP
GetRP
GetMultRPs
EPR
GetMultRPs
EPR
GetMultRPs
EPR
EPR
SetRP
EPR
EPRResource
SetRP
EPRResource
SetRP
Resource QueryRPs
QueryRPs
RPs
QueryRPs
Subscribe
RPs
Subscribe
RPs
Subscribe
SetTermTime
SetTermTime
ResourceHome
SetTermTime
Destroy
ResourceHome
Destroy
ResourceHome
Destroy
Service Container: host
multiple services in
container; one JVM
process
…more details: based
on AXIS service
container, processes
SOAP messages
USC Viterbi School of Engineering
GT4 WS Core in a Nutshell
Service Container
Service
Service
Service
Secure Communication:
Transport, Message,
Conversation (Transport
demonstrates best
performance)
PIP
GetRP
GetRP
GetRP
GetMultRPs
EPR
GetMultRPs
EPR
GetMultRPs
EPR
EPR
SetRP
EPR
EPRResource
SetRP
EPRResource
SetRP
Resource QueryRPs
QueryRPs
RPs
QueryRPs
Subscribe
RPs
Subscribe
RPs
Subscribe
SetTermTime
SetTermTime
ResourceHome
SetTermTime
Destroy
ResourceHome
Destroy
ResourceHome
Destroy
PDP
Configurable Security Policies:
Policy Information Points
(PIPs), Policy Decision Points
(PDP) -- chained
Example authorization
PDPs: GridMap, SAML
implementations
USC Viterbi School of Engineering
GT4 WS Core in a Nutshell
Service Container
PIP
Service
Service
Service
GetRP
GetRP
GetRP
GetMultRPs
EPR
GetMultRPs
EPR
GetMultRPs
EPR
EPR
SetRP
EPR
EPRResource
SetRP
EPRResource
SetRP
Resource QueryRPs
QueryRPs
RPs
QueryRPs
Subscribe
RPs
Subscribe
RPs
Subscribe
SetTermTime
SetTermTime
ResourceHome
SetTermTime
Destroy
ResourceHome
Destroy
ResourceHome
Destroy
WorkManager
DB Conn Pool
PDP
WorkManager: “thread
pool”, site independent
“work” manager
Apache Database
Connection Pool library
(JDBC “DataSource”
implementation)
JNDI Directory: manages
internal, shared objects
(ResourceHomes,
WorkManager,
Configuration objects,…)
JNDI Directory
USC Viterbi School of Engineering
GT4 WS Core in a Nutshell
Apache Tomcat
Service Container
PIP
Service
Service
Service
GetRP
GetRP
GetRP
GetMultRPs
EPR
GetMultRPs
EPR
GetMultRPs
EPR
EPR
SetRP
EPR
EPRResource
SetRP
EPRResource
SetRP
Resource QueryRPs
QueryRPs
RPs
QueryRPs
Subscribe
RPs
Subscribe
RPs
Subscribe
SetTermTime
SetTermTime
ResourceHome
SetTermTime
Destroy
ResourceHome
Destroy
ResourceHome
Destroy
WorkManager
DB Conn Pool
PDP
JNDI Directory
USC Viterbi School of Engineering
Deploy Service
Container “standalone”
or within Apache
Tomcat
Relationship
Between OGSA,
GT4, WSRF & Web
Services
USC Viterbi School of Engineering
Dealing with Data
OGSA-DAI
USC Viterbi School of Engineering
OGSA-DAI
•
•
•
•
•
An extensible framework
accessed via web services
that executes data-centric workflows
involving heterogeneous data resources
for the purposes of data access, integration,
transformation and delivery within a Grid
• and is intended as a toolkit for building higher-level
application-specific data services
USC Viterbi School of Engineering
Motivation
• Grid is about sharing resources
• OGSA-DAI is about sharing structured data resources
Relational
Database
XML
Database
Indexed
File
USC Viterbi School of Engineering
Sharing data via website download
• ZIP up data and put it on a website
• Pros
– Easy distribution for providers
– Easy access for consumers
• Cons
– Consumers have to download all the data
– Consumers have to load data into local databases to use
it
– Static snapshot
– Security
USC Viterbi School of Engineering
Sharing data via direct access
• Providers tell consumers
– Database URL – mycomputer.epcc.ed.ac.uk:3306
– Username – userID
– Password – password
• Pros
– Consumers have direct access, so it should be faster
• Cons
–
–
–
–
–
Firewall issues
User and password management is hard
No consistent security model
Hard to use in grid/web service workflows
Continued on next slide…
USC Viterbi School of Engineering
Sharing data via direct access
• Cons (continued)
– No server-side layer in which to standardize database
heterogeneities
– Client needs to know, and have installed, correct driver
for the database.
– Different drivers for Java, C#, C++, Fortran etc.
– Totally different API for different database types, e.g.
JDBC for Relational, XMLDB for XML, Lucene for
indexed files.
USC Viterbi School of Engineering
Domain-specific web services
• Manipulate data using domain-specific operations,
e.g.
– Book findByISBN(ISBN)
– List<Book> findByAuthor(Author)
– List<Book> findByKeyword(Word)
• Pros
–
–
–
–
Fits with grid/web service approach
Abstraction hides back-end database details
Web services are programming language neutral
Operations likely to map well to authorization policies
USC Viterbi School of Engineering
• Cons
Domain-specific web services
– Slower than direct access
• Web service layer
• SOAP transport overhead – especially for large result sets
– Domain-specific API prevents use of generic data exploration,
mining and manipulation tools
Books
Cancer
Generic Data Linking
Application
Books written
by University
employees
University
Employees
University
employees in
1932 who have
since died of
cancer
USC Viterbi School of Engineering
OGSA-DAI generic web services
• Manipulate data using OGSA-DAI’s generic web services
• Clients sees the data in its ‘raw’ format, e.g.
– Tables, columns, rows for relational data
– Collections, elements etc. for XML data
• Clients can obtain the schema of the data
• Clients send queries in appropriate query language, e.g.
SQL, XPath
Relational
Database
request
OGSA-DAI
XML
Database
data
Indexed
File
USC Viterbi School of Engineering
OGSA-DAI
• Pros:
– Fits with grid/web service approach.
– Web services are programming language neutral.
– Access to schema and raw data supports generic tools.
• Cons:
– Slower than direct connection mainly due to SOAP
overhead.
– One more layer between client and data
– Data not transferred in efficient binary format.
USC Viterbi School of Engineering
Reducing the SOAP effect – workflows
OGSA-DAI
Transform
Web Service
OGSA-DAI
Query ->
Transform ->
DeliverToFTP
FTP Server
USC Viterbi School of Engineering
FTP Server