Macromolecular Structure Database group

Download Report

Transcript Macromolecular Structure Database group

EMBL-EBI
Structural Proteomics Automatic
Target Selection
Gordon Whamond
EMBL-EBI
Project Overview
Aim:
• Provide a resource that facilitates the automatic selection of
potential targets for protein structure determination while
minimising human interaction with the software (if required).
Input:
• Raw amino acid sequence
• UniProt accession number
• UniProt accession number and a sequence range
Output:
• Query sequence showing possible domains
• All candidates for structure determination
• Recommendation for which sequence to use
EMBL-EBI
Considerations
• Is there a known structure?
• Are there Classified Structural (CATH, SCOP) Domains?
• Are there Known Sequence (Pfam) Domains?
• Are there Predicted Structural (Gene3D, Superfamily) Domains?
• Do Domain Boundaries Conform to Secondary Structure
Restrictions?
• Which Species has a Representative Domain that is the Most
Compactly Folded?
• The core implementation needs to be extendible and
easily maintainable.
EMBL-EBI
Taverna
The software is to be implemented using the Taverna workbench.
This is a tool that can be used to formulate the workflow and implement
each of the processes as distributed web services.
Advantages:
• Distributed computing reduces resource requirement.
• Easily extendible system
• Maintenance issues shifted to external providers
Disadvantages:
• Learning curve
• Convincing service providers to adopt a standard format
• Maintenance issues shifted to external providers
Tom Oinn - http://taverna.sourceforge.net/
EMBL-EBI
Taverna
The prototype workflow:
When it is expanded to show all of
the incorporated sub-workflows is
quite complex
Luckily Taverna can provide a top
level view.
EMBL-EBI
Taverna
EMBL-EBI
Dealing With DAS
EMBL-EBI
Taverna
EMBL-EBI
Process Data
Secondary Structure Elements:
(Method not yet chosen)
Sequence Domains:
Pfam, Gene3D, Superfamily etc
Protein Folding:
RONN, FoldIndex, DisEMBL
Rank Target Selection:
Based on loop lengths, folding predictions, etc
EMBL-EBI
Starting the Process
EMBL-EBI
Monitoring Progress
EMBL-EBI
Assess Data
EMBL-EBI
Review Results
EMBL-EBI
Extensibility
Java Services
• Straightforward to provide as a web service using Tomcat and Axis
• WSDL (describing the service) can be generated automatically
Legacy Software
• Any command line based tools can be wrapped into a web service
using Soaplab
•For example the EMBOSS tools are already available
EMBL-EBI
Extensibility
Output Format:
To ensure generic service compatibility it helps to define a common
results format. As a result we are using the e-Family service schema
(http://www.efamily.org.uk/)
Current collaborators include:
The Weizmann Institute - FoldIndex
University of Oxford - RONN
EMBL-EBI
Results Viewers
http://www.efamily.org.uk/software/dasclients/spice/
EMBL-EBI
Conclusions
Taverna and Web Services:
• Taverna facilitates the provision of complex distributed systems that
utilise web services
• This reduces maintenance overheads and keeps technology
requirements at a reasonable level
• It is also easily extensible to accommodate new services
Availability:
• Hopefully the core system will be ready by the end of the year
• This will provide the basic workflow for users to customise according
to their needs
EMBL-EBI
Acknowledgments
Thanks to:
Tom Oinn
Andreas Prlic
The RONN and FoldIndex teams
The MSD Group