Taverna: From Biology to Astronomy

Download Report

Transcript Taverna: From Biology to Astronomy

Taverna: From Biology to Astronomy
Dr Katy Wolstencroft
University of Manchester
myGrid
OMII-UK
What is Taverna?
• An environment for workflow design and
execution
• User interface to a larger suite of middleware –
myGrid
• Designed to support in silico experiments in
biology
• Open source
OMII
Open Middleware Infrastructure Institute
• University of Manchester joined with the Universities of
Edinburgh and Southampton in March 2006
• OMII-UK aims to provide software and support to enable
a sustained future for the UK e-Science community and
its international collaborators.
• A guarantee of development and support
The Life Science Community
In silico Biology is an open Community
• Open access to data
• Open access to resources
• Open access to tools
• Open access to applications
Global in silico biological research
The Community Problems
• Everything is Distributed
– Data, Resources and Scientists
• Heterogeneous data
• Very few standards
– I/O formats, data representation, annotation
– Everything is a string!
Integration of data and interoperability of
resources is difficult
Lots of Resources
NAR 2007 – 968 databases
Traditional Bioinformatics
12181
12241
12301
12361
12421
12481
12541
12601
12661
12721
12781
acatttctac
cagtctttta
gaccatccta
gactaattat
taggtgactt
aggagctatt
ttcttataag
tggttaagta
tggcattaag
atccaatacc
taacccattt
caacagtgga
aattttaacc
atagatacac
gttgagcttg
gcctgttttt
tatatattct
tctgtggttt
tacatgacat
tacatccaca
cattaagctg
tctgtctcta
tgaggttgtt
tttagagaag
agtggtgtct
ttaccattta
ttttaattgg
ggatacaagt
ttatattaat
aaaacggatt
atattgtgca
tcactcccca
tggatttgcc
ggtctatgtt
agtcatacag
cactgtgatt
gacaacttca
gatcttaatt
tctttatcag
gtttttattg
atcttaacca
actatcacca
atctcccatt
tgttctggat
ctcaccaaat
tcaatagcct
ttaatttgca
ttagagaagt
tttttaaatt
atacacagtt
atgactgttt
ttttaaaatg
ctatcatact
ttcccacccc
attcatatta
ttggtgttgt
tttttagctt
ttttcctgct
gtctaatatt
attgatttgt
tgtgactatt
tttacaattg
taaaattcga
ccaaaagggc
tgacaatcaa
atagaatcaa
Workflows as a Solution
• Describes what you want to do, not how you
want to do it
• High level description of the experiment
• Easier to explain, share, relocate, reuse and
repurpose.
• Workflow <=> Model
• Workflow is the integrator of knowledge
• The METHODS section of a scientific publication
Taverna Workflow Components
Scufl Simple Conceptual Unified Flow Language
Taverna in an Open World
•
•
•
•
Open domain services and resources.
Taverna accesses 3000+ services
Third party – we don’t own them – we didn’t build them
All the major providers
– NCBI, DDBJ, EBI …
• Enforce NO common data model.
• Quality Web
Services
considered
desirable
What can you do with myGrid?
• ~33,000 downloads
• Users worldwide
US, Singapore, UK,
Europe, Australia
•
•
•
•
•
•
•
•
•
•
•
•
Systems biology
Proteomics
Gene/protein annotation
Microarray data analysis
Medical image analysis
Heart simulations
High throughput screening
Genotype/Phenotype studies
Health Informatics
Astronomy
Chemoinformatics
Data integration
Examples – Early Pioneers
Williams-Beuren Syndrome
Identifying new human genome sequence and genes contained within in an
area of the genome associated with the disease
Improve understanding between genotype and phenotype
CTA-315H11
RP11-622P13
ELN
WBSCR28
WBSCR27
CLDN4
CLDN3
WBSCR21
STX1A
WBSCR22
WBSCR18
WBSCR24
WBSCR14
Four workflow cycles totalling ~ 10 hours
The gap was correctly closed and all known
features identified
CTB51J22
RP11-148M21 RP11-731K22
314,004bp extension
All nine known genes identified
(40/45 exons identified)
Trypanosomiasis in Africa
Resistance to parasites in
different breeds of cattle
Involves:
•Microarray analysis
•Classical genetics
•Biochemical pathway
analysis
Large data sets, large results sets
http://www.genomics.liv.ac.uk/tryps/trypsindex.html
Is Taverna Just for Biologists?
• Nothing in the code is specific to biology
• The default list of services ARE bio services, but
Taverna doesn’t care what they are
• Services from other science disciplines can
simply be slotted in
Other Examples
• Medical imaging
– MIAS-GRID –investigating cartilage thickness during
drug trials
– 2D and 3D brain image registration
• Chemoinformatics
– CDK-Taverna – project to provide the CDK
chemoinformatics tool set as web services
– Chimatica - Virtual Drug Candidate Production
Environment
• Health informatics
– PsyGrid – investigating first episode psychosis
Dilbert ##
What Taverna Gives you
•
•
•
•
•
Automation
Implicit iteration
Implicit parallelisation
Support for nested workflow construction
Error handling
– Retry, failover and automatic substitution of alternates
Extensibility
• Accepts many types of services:
- web services, beanshell scripts, local java scripts, JDBC
connections…etc
• Easy to add your own services
• Plug-in architecture
Easy to build new processor types
Easy to extend to include alternative results viewers
Could Taverna be used for Astronomy?
• Lots of data (although individual data items
might be bigger)
• Distributed data
• Chains of analyses
• MORE standards for data formatting/exchange
Investigated by AstroGrid and SAMPO
Sampo - European Southern
Observatory project
Workflows for data reduction
Reasons for choosing Taverna
Open source
Free
Allows customisation
Easy to use and adapt
Designed for science
Most workflow engines are meant
for business applications
Very robust
Actively developed
Good support for web services
AstroGrid Workflows
Evaluation of Taverna
Building plug-ins for AstroGird project
In the process of gathering AstroGrid
requirements
Still things to address……..
Coming soon…Taverna 2
A complete redesign of Taverna from the ground
up to enable:
• Streaming data
• Management of large volumes of data
• Better remote workflow execution
• Integration with grid resources
• Monitoring and steering
Beta release due end summer 2007
myGrid
acknowledgements
Carole Goble, Norman Paton, Robert Stevens, Anil Wipat, David De Roure, Steve Pettifer
•
•
•
•
•
•
•
OMII-UK Tom Oinn, Katy Wolstencroft, Daniele Turi, June Finch, Stuart Owen, David
Withers, Stian Soiland, Franck Tanoh, Matthew Gamble, Alan Williams
Research Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, Antoon Goderis,
Alastair Hampshire, Qiuwei Yu, Wang Kaixuan.
Current contributors Matthew Pocock, James Marsh, Khalid Belhajjame, PsyGrid
project, Bergen people, EMBRACE people.
User Advocates and their bosses Simon Pearce, Claire Jennings, Hannah Tipney,
May Tassabehji, Andy Brass, Paul Fisher, Peter Li, Simon Hubbard, Tracy Craddock,
Doug Kell, Marco Roos, Matthew Pocock, Mark Wilkinson
Past Contributors Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil
Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Chris
Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Phillip Lord, Darren
Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Juri Papay, Savas Parastatidis,
Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick
Sharman, Victor Tan, Paul Watson, and Chris Wroe.
Industrial Dennis Quan, Sean Martin, Michael Niemi (IBM), Chimatica.
Funding EPSRC, Wellcome Trust.