Transcript SysMO-DB
SysMO-DB: Sharing and Exchanging
Data and Models in Systems Biology
Katy Wolstencroft
University of Manchester
SysMO-DB
DB
A data access, model handling and data integration
platform for Systems Biology:
To support and manage the diversity of
Data, Models and experimental protocols
Local data management systems
That promotes shared understanding
Using a common platform and common
technologies
Systems Biology Challenges
Interdisciplinary work
Heterogeneous data and models
Modellers and experimentalists have different
skills, training, experience
Modellers and experimentalists have different
vocabularies and jargon
Working together
Systems Biology of Microorganisms
http://www.sysmo.net
Pan European collaboration
Eleven individual projects, 91 institutes
Different research outcomes
A cross-section of microorganisms, incl.
bacteria, archaea and yeast
Record and describe the dynamic
molecular processes occurring in
microorganisms in a comprehensive way
Present these processes in the form of
computerized mathematical models
Pool research capacities and know-how
Already running since April 2007
Runs for 3-5 years
This year, 2 new projects join and 6 leave
The Problem
No one concept of
experimentation or modelling
No planned, shared infrastructure
for pooling
Types of data
Multiple omics
Images
Molecular biology
Reaction Kinetics
Models
Metabolic, gene network, kinetic
Relationships between data sets/experiments
genomics, transcriptomics
proteomics, metabolomics
fluxomics, reactomics
Procedures, experiments, data, results and models
Analysis of data
DB
SysMO-DB
Started in June 2008
Web-based solution to facilitate:
exchange
of data, models and processes
(intra- and inter- consortia)
search for data, models and processes
across the initiative
maximisation of the "shelf life" and
utility of the data, models and
processes generated
dissemination of results
SysMO-DB Team
Hits,
Germany
Wolfgang
Müller
SABIO-RK
Stuart Owen
Carole Goble
Isabel Rojas Olga Krebs
Katy Wolstencroft
University of Manchester, UK
Finn Bacall
Taverna
myExperiment
Jacky Snoep
JWS Online
University of Stellenbosch, South Africa
University of Manchester, UK
SysMO-DB PALS team
Power Contributors.
Audits and Sharing.
21 Postdocs and PhD students
Design and technical
collaboration team
Intense collaboration
UK and Continental PALS
Chapters
Methods, data, models,
standards, software, schemas,
spreadsheets, SOPs…..
20 questions
Deployment into Projects
Principles…
A series of small victories
Realistic
Don‘t reinvent
Sustainable and extensible
Migrate to standards
Provide instant gratification
Incremental development
Fitting in with normal lab
practices
The Lowest Hanging Fruit
SysMO SEEK – a catalogue of assets
SysMO Yellow Pages
The people and their expertise
The institutions and their facilities
Data – experimental data sets
Data – analysed results
Data – external reference data sets
Models
Processes – laboratory protocols and bioinformatics
analyses
Publications
The catalogue references assets held elsewhere
SEEK screenshot?
Harvesters
SysMOLab
Wiki
COSMIC
Alfresco
MOSES
Wiki
BaCellSysMO
Alfresco
ANOTHER
A DATA
STORE
Why not a central Warehouse?
Protective of models
Reluctant to share data
in progress vs published models.
Access and Version management
Curator-Rival conflict
Even within their own projects
Legacy spreadsheets dominate
Curation practices vary
Centralised archive take-up
Point to Point Exchange
Nature 461, 145 (10 Sept09)
People don’t mind sharing methods
People want to advertise publications
Just Enough Sharing
Access
Permissions
Reusing myExperiment
SysMO-DB Architecture
SysMO-SEEK web interface
Processes
Models
Data
SysMO DB
Assets and Yellow
Pages Catalogues
JERM
Making use of the Assets
Understanding the content of the data
Linking assets together
Linking assets to experimental context
Running comparisons between data files
Running model simulations
Running data analysis pipelines
What is the JERM?
JERM “Just Enough Results Model”
Minimum information to exchange data
What type of data is it
What was measured
Gene expression, OD, metabolite concentration….
What do the values in the datasets mean
Microarray, growth curve, enzyme activity…
Units, time series, repeats….
Which experiment does it relate to?
How does it relate to models?
How was the data created
SOPs and protocols
Minimum
Information
Models
CIMR Core Information for Metabolomics Reporting
MIABE Minimal Information About a Bioactive Entity
MIACA Minimal Information About a Cellular Assay
MIAME Minimum Information About a Microarray Experiment
MIAME/Env MIAME / Environmental transcriptomic experiment
MIAME/Nutr MIAME / Nutrigenomics
MIAME/Plant MIAME / Plant transcriptomics
MIAME/Tox MIAME / Toxicogenomics
MIAPA Minimum Information About a Phylogenetic Analysis
MIAPAR Minimum Information About a Protein Affinity Reagent
MIAPE Minimum Information About a Proteomics Experiment
MIARE Minimum Information About a RNAi Experiment
MIASE Minimum Information About a Simulation Experiment
MIENS Minimum Information about an ENvironmental Sequence
MIFlowCyt Minimum Information for a Flow Cytometry Experiment
MIGen Minimum Information about a Genotyping Experiment
MIGS Minimum Information about a Genome Sequence
MIMIx Minimum Information about a Molecular Interaction Experiment
MIMPP Minimal Information for Mouse Phenotyping Procedures
MINI Minimum Information about a Neuroscience Investigation
MINIMESS Minimal Metagenome Sequence Analysis Standard
MINSEQE Minimum Information about a high-throughput SeQuencing Experiment
MIPFE Minimal Information for Protein Functional Evaluation
MIQAS Minimal Information for QTLs and Association Studies
MIqPCR Minimum Information about a quantitative Polymerase Chain Reaction experiment
MIRIAM Minimal Information Required In the Annotation of biochemical Models
MISFISHIE Minimum Information Specification For In Situ Hybridization and Immunohistochemistry
Experiments
STRENDA Standards for Reporting Enzymology Data
TBC Tox Biology Checklist
BioPAX : Biological Pathways Exchange http://www.biopax.org/
FuGE Functional Genomics Experiment
MGED: Microarray Experimental Conditions
http://www.mibbi.org/index.php/MIBBI_portal
The Idea
1
ISATAB
For each data type…..
Transcriptomics
Proteomics
Metabolomics
Single Cell Data
Define a JERM…..
2
Top down analysis of standards
Bottom up analysis of practice
Generate and apply….
3
JERM template
JERM extractor for data host
Subset registered in SEEK
Access / export through JERM interface / template
SEEK + JERM
Experimental
Data
Metadata
Investigation
Homogenised
terminology
and values in
the datasets
themselves
Study
People
Projects
Assay
Experimental
conditions
Factors
studied
Models
SOPs
Workflows
Based on ISA-TAB
For publishing
JERM data needs to be related to SOPs,
experimental context (ISA) and other data
JERM must be “MIBBI” compliant for exporting to
public repositories
e.g. Microarray data needs to be MIAME compliant
ISA-TAB
Relating data and its experimental
context
Investigation, Study, Assay
TAB = tabular
A format suitable for spreadsheets
http://isatab.sourceforge.net/
ISA Provides....
A common framework for relating different
types of data e.g. microarrays and proteomics
Facilitates submission to international public
repositories of genomics, transcriptomics and
proteomics studies
Identifying Biological Objects
What do you have in your data?
Where/how do these objects interact?
Proteins/enzymes, genes/expression levels,
metabolites
Pathways, flux, experimental conditions
What models describe these interactions
Possible when using common frameworks,
naming schemes and controlled vocabularies
BioPortal Integration for Searching
Repository for submitting and sharing Biological
ontologies http://bioportal.bioontology.org/
Search for concepts across all or selected
ontologies
BioPortal provides a number of Restful
Webservices
Search
Concept lookup
Visualisation
Integrated within SEEK as a plugin
Tools to help manage data:
Annotation standards by stealth
Controlled vocabulary plug in
BioPortal
Following Standards
We recommend formats but we do not enforce
them
Protocols and SOPs – Nature Protocols
Data – JERM models and community minimum
information models
Models – SBML and related standards
Publications – PubMed and DOI
If you follow the prescribed formats, you get
more out, but if you don’t, you can still
participate
lowering the adoption barrier
Off the shelf
Except for the JERM, we have only used
community resources, vocabularies and services
You can get a long way by implementing
community practices and providing ways to
integrate them
SysMO-DB and Models
Nicolas Le Novere, Data Integration in the Life Sciences, Manchester, 2009
Models: Incentives for using Standards
Models can be shared in SysMO-SEEK in any format
SBML is the recommended format
We also recommend MIRIAM compliance and SBO annotation
If you use SBML, you can use JWS Online to run simulations
in SEEK
Screenshot of JWS Online
JWS Online Plugin
•online simulator, runs in
your browser
•upload models in SBML
format
•Web Service enabled
•SBGN schemas, with
annotations and external
links
Falko Krause, Humboldt-University, Berlin
http://www.semanticsbml.org/aym
Models Resources
Models can be published in public repositories
Models can be annotated
JWS-Online, BioModels
SBML, MIRIAM, SBO
No public resources currently for sharing models
with associated data, or for loading new data into
models
Linking Data to Models
Relating data and models
Where did the data come from for developing the
model?
Where did the data come from for validating the
model?
What were the results of model simulations?
Current Functionality in SEEK
Show all data used for construction together with
the model, such that process can be repeated
Uploaded models loaded with this data by
default
Manually alter parameters and run simulations
Next Steps: Model Validation
Test/compare model with experimental data for
complete system
Find data in SEEK
Upload data from elsewhere
Automatically load into model
Run simulations and compare with original results
JERM for models
Mapping tools – allows you to identify
columns/rows in spreadsheets containing the
right information
ISA for Models
Modelling and experimental work intersect
Investigations, Study, Assay.....or modelling
analysis.....
Modelling analysis types
Modelling type
Metabolic models, gene networks
ODE, algebraic
Studies – combinations of experimental assays,
modelling analyses, and informatics analyses
SysMO-DB the e-Laboratory
An e-Laboratory is an information system for
bringing together people, data and analytical
methods at the point of investigation or
decision-making
Current Status
Finding things so that we can compare them
Understanding who has what
Understanding what can be compared with what
– the experimental context
Where we are going…
A dynamic resource for analysis as well as browsing
Automatic comparison of data from inside files
Understanding where and how data and models
are linked
Running simulations with new experimental data
Running analyses and workflows over the data
and models
Workflows from myExperiment
Data preparation, annotation and analysis
Systems Biology workflow Pack on myExperiment
Microarray analysis and text mining
Created by Afsaneh Maleki-Dizaji
from SUMO, University of Sheffield
Based on previous work by Paul
Fisher, University of Manchester
http://www.myexperiment.org/workflows/187
SEEK as a data analysis and
meta analysis service
SBML model construction and population
Calibration workflow
Data requirements
Parameterised SBML model
Experimental data
Metabolite
concentrations from key
results database
Calibration by COPASI
web service
Peter Li
Data analysis and meta analysis
SEEK Analysis Service with pre-cooked analysis tools.
Calibration workflow
Data requirements
Parameterised SBML model
Experimental data
Metabolite
concentrations from key
results database
Load model:
Load data:
GO
Calibration by COPASI
web service
Peter Li
New Directions
Opening SysMO Out
Using SysMO as a dissemination space for the
SysMO consortium
Packaging software so that others can use it
Easy to install a SEEK for yourself
Packaging and exchanging JERM Templates
Supplementary material in publications
Data citation
Helping with standardisation
Promotion and example work with SBRML and
data and models linkage
SysMO-DB Approach in Other projects
SysMO2 – new projects and legacy
EraSysBio+
Lungsys and SBCancer
Virtual Liver
New Considerations
Eukaryotic organisms
Interactions between host and pathogen
Human disease
multicellular interactions, tissues, organs
multiscale modelling
Outstanding Issues
Keeping data at project sites has responsibilities
Reliability - Sites available continuously and promptly
Support - Must be proof against virus attacks, etc.
Archiving - Beyond the lifetime of the project.
How it works
Find a solution that fits in with current practices
Start simple, show benefits, add more
Engage with the people actually doing the work
PhD students, Post-docs
Let the scientists retain control over their data
and who can see it
Don’t reinvent. Use available vocabularies,
minimal model standards
Help prevent people duplicating work by linking
the people as well as the resources
Acknowledgements
SysMO-DB Team
SysMO-PALS
myGrid, Hits and JWS Online teams
EMBL-EBI, MCISB
http://www.sysmo-db.org