The Functional Genomics Experiment Object Model

Download Report

Transcript The Functional Genomics Experiment Object Model

The Functional Genomics
Experiment Object Model (FuGE)
Andrew Jones,
School of Computer Science,
University of Manchester
MGED Society
What is FuGE?
• Various groups have tried to fuse MAGE and
PEDRo in the past
– Such a model would be difficult to manage
• FuGE is a model of the common components of
functional genomics experiments
• Aims to help the development of data standards
• Should allow some cross-compatibility between
different ‘omics experiments
• Microarray & proteome standards will use parts
of FuGE for some data formats
So, what is FuGE?
• An object model in UML (close to 1st stable
release)
• An XML Schema (in development)
• A software API (will be created from UML)
• FuGE use ontologies extensively, such as
MGED Ontology or its successor (FuGO)
Developed by members of MGED / PSI with input
from cross-omics experimentalists e.g. RSBI
What is FuGE not…?
• Not an effort to create one data standard
for all lab techniques
– This problem is hard at technical level and v
hard getting agreement from all groups
• Not a model for metabolomics metadata
– But it might help in the development of one
– …and we would like to encourage input from
the metabolomics community
FuGE Structure
• 2 sections: Common and Bio
• Common – components that aid the
development of a rich data standard
– Protocols, external references, auditing and security
settings
• Bio – biological specific components
– Biological (or chemical) materials, bio sequences
– Summary of an investigation structure
– References to data model specific to each domain
Protocols
• Protocols have a set of ordered atomic actions
– Actions are user-entered text or ontology terms
• Protocols can be associated with Software and
Equipment
• Protocols, Software and Equipment can have a
set of defined Parameters
• Mechanism for defining a standard protocol, and
an instance of a protocol (date, operator…)
• Nested protocols can be defined for
representing complex procedures
– An Action can be a reference to another Protocol
FuGE Workflow
Material
Material
= Inputs and outputs of Protocols
Treatment
Treatment
Material
Material
= Instance of some Protocol
Treatment
Data Acquisition
Data Transformation
Material
Data
Data
FuGE Workflow
Material
Material
Treatment
Treatment
•Materials defined using terms
from ontologies
•Treatments defined by Protocols
•Data represented in domain
specific format
Material
Material
•FuGE is the “glue” for sticking
components together
Treatment
Data Acquisition
Data Transformation
Material
Data
Data
Other useful components
• Each object can be tagged with audit info:
– Who made a change, when, what type of change
• Security information:
– users, groups for accessing/changing data
• Consistent mechanism for identifying objects
– Life sciences IDs (LSIDs) used to uniquely ID
components
– Objects can be referenced across documents
• Mechanism for linking to external databases,
literature refs and ontologies
Investigation model
• Stores a summary of the investigation to
facilitate queries
• Purpose of investigation (hypothesis)
• Design of the investigation
– e.g. strain differences, gene knockout, drug doses,
time course
• Stores the important variables
– Values from ontology e.g. gene names, units etc…
• Links from variables to relevant data items
Benefits of shared components
• Queries over common annotation
– Samples, hypotheses, protocols
• Shared software for experimental annotation
and analysis
– Microarrays, proteomics and metabolomics (and other
experiments!) performed in same lab
• Developing standards for each technique is a
hard problem
– Shared resources could alleviate the problems (audit,
security, identifying objects, ontologies)
Using FuGE in Practice
1. Imports parts of UML or XML Schema and
extend with domain-specific components
•
Example: Attempting to integrate FuGE with our
Manchester metabolomics database
2. Reference a FuGE entry for investigation
structure and bio samples
3. Define ontologies and use FuGE as it is for
experimental metadata
•
This would not include a format for mass spec or
NMR data, which would also be needed
Conclusions
• FuGE was created to solve the general problem:
– What are the common requirements for a “functional
genomics” data standard?
• MGED will use FuGE for generating MAGE
version 2
• PSI evaluating FuGE for protein separation
standard format
• FuGE-based systems being implemented by a
number of organisations
• FuGE could help develop a metabolome format
http://fuge.sourceforge.net
Acknowledgements
• FuGE has been developed in collaboration
with many groups, including:
– Angel Pizarro (U Penn)
– Paul Spellman (Lawrence Berkley)
– Michael Miller (Rosetta)
– Members of Fred Hutchinson CRC, Seattle
– RSBI
– Various other members of MGED and PSI
http://fuge.sourceforge.net
Describable
Identifiable
Common.Description
•Many classes inherit from Describable
•Link to Audit / Security details
•URI and text description
Protocol
Audit
Investigation
Material
Common.Data
•Ordered set of Dimensions
•Data stored in Matrix
•Matrix must be extended
with subclasses