Transcript task

Recycling Services and
Workflows through Discovery
and Reuse
Chris Wroe1, Phillip Lord1, Simon Miles2, Juri
Papay2, Luc Moreau2, Carole Goble1
1 University of Manchester,
2 University of Southampton,
Workflows
•
Taverna Workflow Diagram
myGrid
has
concentrated on
making workflows as
straightforward to
build from existing
web services as
possible.
But….
• Doesn’t mean they
are disposable
• Scientists build more
complex protocols
– Time investment
– Workflows become a
useful resource
capturing what
services work well
together to perform a
goal
Common bioinformatics tasks
• Task 1
– I’ve got a sequence what genes may be in it?
• Task 2
– I’ve got a gene what is known about this gene?
• Task 3
– I’ve got a gene what is known about the protein coded
for that gene?
• Task 4
– I’ve got a microarray result, analyse it for upregulated, down regulated, clusters of gene
regulation?
Promoting sharing
• Workflows intended to make it more
straightforward to share
– Re-locatable
– Explainable
• How do you promote sharing:
– How do you find them and component
services in the first place?
• Need a way of describing what they do
• Need a place to put descriptions
• Need a way of searching for them
Requirements
• User centric model for describing services
and workflows (bioinformatics focus)
• Architecture & middleware components
• User applications for describing and
searching for workflows and services
myGrid’s
model of services
operation
name, description
input
output
task
method
resource
application
•Compatible with UDDI & WSDL
•Compatible with OWL-s
•Compatible with bioMoby
•User centric
•Data centric
•Operation centric
myGrid’s
model of services
operation
•Simple description of
workflows
name, description
input
output
task
method
resource
application
• overall inputs and
outputs
• component operations
contains
subClassOf
workflow
WSDL operation
Soaplab service
bioMoby service
• not the sequencing of
operations
myGrid’s
model of services
operation
name, description input
input
output
task
method
resource
application
output
parameter
name, description
semantic type
format
transport type
collection type
collection format
workflow
WSDL operation
Soaplab service
bioMoby service
myGrid’s
model of services
operation
service
name, description input
input
output
task
method
resource
application
output
parameter
name, description
name, description
author
organisation
semantic type
format
transport type
collection type
collection format
workflow
WSDL operation
Soaplab service
bioMoby service
WSDL service
A Blast Description
Service Name: Blast
Operation: execute
task: pairwise_local_aligning
resource: EMBL
application: blastn
Parameter:
Input:
Name: accession
semantic type: EMBL Nucleotide sequence id
transport data type: string
Output:
Name: Result
semantic type: sequence alignment report
transport data type: string
View Service Architecture
Discovery by describing
services required
Taverna
Workbench
Discovery
Client
Personalised discovery
using UDDI clients
and publishing of personal
metadata
Semantic
Find
Component
Workflow
Registry
Extract service
descriptions to
reason over
Personalised
View
Component
Service
Registry
Pull service
adverts from
global
registries
Service
Registry
Registries and views
• Registry
– Stores structured description of service or
workflow
– UDDI & WSDL data model with extensibility
– Represented as RDF in Jena repository
– Query using RDQL
• Views
– Local views aggregate a filtered set of
registrations from multiple registries
Views over registries
Organisational view
Blast @ Soton
Blast @ NCBI
Notification of
worflows and
services with a
performance
indicator > 90
External registry 1
Blast @ NCBI
External registry 2
Blast @ DDBJ
Local registry 2
Blast @ Soton
FETA
• Registries are domain independent
• But many queries and indexes of services
and workflows are domain dependent
• FETA provides a domain dependent
indexing component that works in
concert with the registry.
• Uses ontologies as a source of domain
knowledge
FETA Example
• Domain dependent query
– “Find a workflow or service that performs
nucleotide sequence alignment”
• = performs task aligning or more specific
• + accepts input nucleotide sequence or more
general
Biological data
Bio Sequence data
Task
Aligning
Nucleotide sequence data
Local aligning
…….
Protein sequence data
…….
Pairwise local aligning
Global aligning
…….
Annotation
Ontologists
Ontology Store
Description
extraction
Interface
Description
Vocabulary
Annotation
providers
Pedro
Annotation tool
Service
Providers
Others
WSDL
Soaplab
Annotation/
description
Taverna Workbench
Registry
plug-in
Registry
(Personalised View)
Registry
Registry
Pedro Data Entry Tool
Pedro Data Entry Tool
Discovery in Taverna
• User chooses services
• A common ontology is
used to annotate and
query any myGrid object
including services.
• Discover workflows and
services described in the
registry via Taverna.
• Look for all workflows that
accept an input of
semantic type nucleotide
sequence
• Aim to have semantic
discovery over public view
on the Web.
Reuse in Taverna
• Drag a workflow
entry into the
explorer pane
and the workflow
loads.
• Drag a service/
workflow to the
scavenger
window for
inclusion into the
workflow
Uptake & availability
• Have we succeeded in promoting uptake?
– Too early to say
– Plan to deploy a public registry in the Autumn
• Software availability
– Taverna – available on sourceforge
– Ontology – available on http://www.mygrid.org.uk
– View - available on myGrid website
– FETA – prototype version from myGrid CVS
– PEDRo – available on sourceforge
• CAUTION – we are currently reassessing / reimplementing the communication between these
components. If you are interested in deploying this
system wait/ contact us for advice.
Acknowledgements
An EPSRC funded UK eScience Program Pilot Project
Particular thanks to the other members of the
Taverna project, http://taverna.sf.net
myGrid
People
Core
• Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro
Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris
Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip
Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri
Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pokock Milena
Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick
Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe.
Users
• Simon Pearce and Claire Jennings, Institute of Human Genetics School of
Clinical Medical Sciences, University of Newcastle, UK
• Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester,
UK
• Steve Kemp, Liverpool, UK
Postgraduates
• Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith
Flanagan, Antoon Goderis, Tracy Craddock, Alastair Hampshire
Industrial
• Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM)
• Robin McEntire (GSK)
Collaborators
• Keith Decker
Gratuitous Advertising – SOFG2