Systems Biology Data Sharing in SysMO-DB
Download
Report
Transcript Systems Biology Data Sharing in SysMO-DB
SysMO-SEEK: Sharing Data and
Models in Systems Biology
Katy Wolstencroft
Stuart Owen
Jacky Snoep
University of Manchester
SysMO-DB Project
DB
A data access, model handling and data integration
platform for Systems Biology:
To support and manage the diversity of
Data, Models and experimental protocols from a
consortium
Web based
Standards compliant
Systems Biology of Microorganisms
http://www.sysmo.net
Pan European collaboration
13 individual projects, >100 institutes
Different research outcomes
A cross-section of microorganisms, incl.
bacteria, archaea and yeast
Record and describe the dynamic
molecular processes occurring in
microorganisms in a comprehensive way
Present these processes in the form of
computerized mathematical models
Pool research capacities and know-how
Already running since April 2007
Runs for 3-5 years
This year, 2 new projects join and 6 leave
Types of data
Multiple omics
Images
Molecular biology
Reaction Kinetics
Models
Metabolic, gene network, kinetic
Relationships between data sets/experiments
genomics, transcriptomics
proteomics, metabolomics
fluxomics, reactomics
Procedures, experiments, data, results and models
Analysis of data
Challenges
Heterogeneous data and models
Distributed groups of researchers
Modellers and experimentalists have different
skills, training, experience
Scientists want to remain in control
Scientists reluctant to share
Social and technical challenges
SysMO-DB Dev Team
Sergejs Aleksejevs
Wolfgang
Müller
Heidelberg
Institute for
Theoretical
Studies
Germany
Carole Goble
Olga Krebs
Katy Wolstencroft
University of Manchester, UK
Stuart Owen
Jacky Snoep
Franco du Preez
University of Stellenbosch,
South Africa
University of Manchester, UK
Finn Bacall
Social Challenge: Focus Group
SysMO PALs
Show what is there
Suggest what is possible
Ask for requirements
Give requirements
Tell priorities
Rate outcomes
Suggest improvements
DB team
Double check
Transmit
Disseminate
Collect answers
Focus Group
Projects
Technical Challenge
Rapid and incremental development
Driven by the PALs
Just enough and just in time , not Just in case
No reinvention
Sustainable and extensible
Migrate to standards
Fitting in with normal lab practices
What do we share
Protocols for Models
Protocol Title
Authors
Keywords
Description
Assumptions
Equations
Numerical Methods/Algorithms
Computational Tools
Parameter Estimation Techniques
Limitations
References
Methods
+
+
Models
Data
All SysMO Assets
+
Results
A Tree View of Assets
Investigation
Studies
SOP
Assay
SOP
ISA infrastructure provides a
directory structure for
experiments
http://isatab.sourceforge.net/
SOP
Construction
Validation
Incentives for sharing
Safe haven for data
Credit and attribution
Help with exporting to public repositories (e.g.
One-click export to ArrayExpress, PRIDE etc)
A repository for “supplementary materials” in
publications
Linking publications and data
Access other resources through a SEEK gateway
Just Enough Sharing
Access
Permissions
...we don’t talk about security
Just Enough sharing
SysMOLab
Wiki
COSMIC
Fetch on
Request
Alfresco
MOSES
Wiki
ANOTHER
Direct
Upload
A DATA
STORE
SOP
How do we share
“Just Enough Results Model”
What type of data is it
What was measured
Microarray, growth curve, enzyme activity…
Gene expression, OD, metabolite concentration….
What do the values in the datasets mean
Units, time series, repeats….
Based on:
Minimum information models
e.g. MIAME, MIAPE, MIRIAM
Biological ontologies
e.g. Gene Ontology, MGED, SBO
Bioportal web service used in SysMO-SEEK for:
Concept lookup and visualisation
How do we share
Share JERM templates developed by SysMO-DB,
PALs and consortium
Spreadsheet templates
Database Schemas
Encourage uptake throughout SysMO
transcriptomics
metabolomics
proteomics etc….
RightField: Annotation by Stealth
Identifying Biological Objects
What do you have in your data?
Where/how do these objects interact?
Proteins/enzymes, genes/expression levels,
metabolites
Pathways, flux, experimental conditions
What models describe these interactions
Possible when using common frameworks,
naming schemes and controlled vocabularies
Following Standards
We recommend formats but we do not enforce
them
Protocols and SOPs – Nature Protocols
Data – JERM models and community minimum
information models
Models – SBML and related standards
Publications – PubMed and DOI
If you follow the prescribed formats, you get
more out, but if you don’t, you can still
participate
Lowering the adoption barrier
SEEK, the eLaboratory
A dynamic resource for analysis as well as browsing
Automatic comparison of data from inside files
Understanding where and how data and models
are linked
Running simulations with new experimental data
Running analyses and workflows over the data
and models
Workflows from myExperiment
Data preparation, annotation and analysis
Systems Biology workflow Pack on myExperiment
Microarray analysis and text mining
Created by Afsaneh Maleki-Dizaji
from SUMO, University of Sheffield
Based on previous work by Paul
Fisher, University of Manchester
http://www.myexperiment.org/workflows/187
SEEK as a data analysis and
meta analysis service
SBML model construction and population
Calibration workflow
Data requirements
Parameterised SBML model
Experimental data
Metabolite
concentrations from key
results database
Calibration by COPASI
web service
Peter Li
Data analysis and meta analysis
SEEK Analysis Service with pre-cooked analysis tools.
Calibration workflow
Data requirements
Parameterised SBML model
Experimental data
Metabolite
concentrations from key
results database
Load model:
Load data:
GO
Calibration by COPASI
web service
Peter Li
Why it works for us
A solution that fits in with current practices
Start simple, show benefits, add more
Engage with the people actually doing the work
PhD students, Post-docs
Build to the PALs requirements
Respect publication cycles
Respect cultural differences
Scientists stay in control
SysMO Methods Spreading
Virtual Liver
Mueller, via HITS
Lungsys
SBCancer
EraSysBio+
Eukaryotic organisms
Interactions between host and pathogen
Human disease
Multi scale modelling
Acknowledgements
SysMO-DB Team
SysMO-PALS
myGrid, Hits and JWS Online
EMBL-EBI, MCISB
http://www.sysmo-db.org