ppt - University of Pennsylvania

Download Report

Transcript ppt - University of Pennsylvania

Annotating Microarray
Data with the MGED
Ontology
NCI Center for Bioinformatics
April 15, 2004
P. L. Whetzel, A. Pizarro, E. Manduchi, J. Liu, H. He, G.
Grant, M. Mailman, C. Stoeckert
Center for Bioinformatics
University of Pennsylvania
Science 298:601-604, 2002
Science 298:597-600, 2002
To compare experiments, you need some minimum
information about the microarray experiments.
Ivanova et al. Science 2003
Microarray Information to be
Shared
Figure from:
David J. Duggan et al. (1999) Expression Profiling using cDNA microarrays. Nature Genetics 21: 10-14
The Computational View of Microarray
Information
MGED Society



International organization
Comprised of biologists
computer scientists, and data
analysts
Aims to facilitate the sharing
and evaluation of microarray
data




www.mged.org
Establish standards for
microarray data annotation
Create microarray databases
Promote sharing of high quality,
well-annotated data
Generalize to data generated
by functional genomics and
proteomics experiments
MGED Standardization
Efforts

MIAME


MAGE-OM


The establishment of a data exchange format and object model
for microarray experiments. (Spellman et al. Genome Biol.
2002)
MGED Ontology


The formulation of the minimum information about a microarray
experiment required to interpret and verify the results. (Brazma
et al. Nature Genetics 2001)
The development of an ontology for microarray experiment
description and biological material (biomaterial) annotation in
particular. (Stoeckrt & Parkinson, Comp. Funct. Genom. 2003)
Transformations

The development of recommendations regarding microarray
data transformations and normalization methods.
MGED Ontology (MO)

Purpose



Benefits



Provide standard terms for the annotation of microarray
experiments
Not to model biology but to provide descriptors for experiment
components
Unambiguous description of how the experiment was performed
Structured queries can be generated
Ontology concepts derived from the MIAME
guidelines/MAGE-OM
MGED Ontology development
http://mged.sourceforge.net/ontologies/MGEDontology.p
hp


OILed
File formats




DAML file
HTML file
NCI DTS Browser
Changes


Notes
Term Tracker
Relationship of
MO to MAGE-OM

MO class hierarchy follows that of MAGE-OM


Association to OntologyEntry
MO provides terms for these associations by:


Instances internal to MO
Instances from external ontologies
 Take advantage of existing ontologies
MGED Ontology
Class Hierarchy

MGED CoreOntology



Coordinated development
with MAGE-OM
Ease of locating appropriate
class to select terms from
MGED
ExtendedOntology

Classes for additional
terms as the usage of
genomics technologies
expand
MAGE and MO
MAGE and MO
Main focus of MGED Ontology

Structured and rich
description of
BioMaterials
BioMaterial
+characteristics
OntologyEntry
+associations
MO and References to
External Ontologies
MO and references to
External Ontologies
Use MGED Ontology for
Structured Descriptions (MAGEML)
http://www.sofg.org
Desirable Microarray Queries

Return all experiments with species X
examined at developmental stage Y


Sort by platform type
Which are untreated? Treated?



Treated with what compound?
How comparable are these?
What can these experiments tell me?
MO and Structured Queries
RAD: RNA Abundance Database
http://www.cbil.upenn.edu/RAD
RAD is part of GUS (Genomics Unified Schema)
The GUS platform maximizes the utility of stored data by
warehousing them in a schema that integrates the
genome, transcriptome, gene regulation and
networks, ontologies and controlled vocabularies,
gene expression
Relational schema (implemented in Oracle)
Stores data from gene expression arrays and SAGE
Comes with a suite of web-annotation forms (StudyAnnotator)
MAGE-RAD Translator (MR_T) generates MAGE-ML files
for exports
Manduchi et al. 2004 Bioinformatics 20:452-459.
GUS (Genomics Unified Schema)
http://www.gusdb.org
Namespace
Domain
Features
RAD
Gene Expression
MIAME/MAGE-OM
SRes
Shared
Resources
Ontologies
DoTS
Sequence and
annotation
Central dogma
Core
Data Provenance
Documentation
TESS
Gene regulation
Grammars
RAD Schema
About 65 tables and 30 views
Assay to Quantification tables
Study Design tables
Tables populated by
BioMaterials tables
the Study-Annotator
Platform tables
Quantification Result tables
Processing tables
Analysis Result tables
Misc tables: Protocol, Contact*, Ontologies*
Meta tables*: data privacy and for history tracking
Integrity Checks tables
* These are used by RAD, but belong to common GUS components
RAD Study-Annotator
Covers all relevant parts of the MIAME checklist
Exploits the MGED Ontology
Allows entering of very specific details of an
experiment
Web-based forms:
Modular structure
Written in PHP
Front-end data integrity checks using JavaScript
Manages Data Privacy based on Project/Group
selections present in GUS schema
Available at http://www.cbil.upenn.edu/RAD/RADinstallation.htm
RAD Study-Annotator
Logical Flow
New
User Registration
Login
Data Preferences
(Project, Group)
Study
Misc
From Assay
to Quantification
Study Design
BioMaterials
(samples, treatments)
Module I
Module II
Module III
Experiment Annotation:
Study Design
BioMaterial Annotation:
Conceptual View
RAD Study Annotator:
BioMaterial Module
RAD Study Annotator:
BioSource Form
RAD Study Annotator:
Treatment Form
Using the Ontologies
new terms can be proposed
RAD
Ontology instances propagated to
annotation web forms
OntologyEntry
RAD Study-Annotator
SRES
MGED Ontology
ExternalDatabases
MGED Ontology
Anatomy
DevelopmentalStage
Disease
Lineage
PATOAttribute
Phenotype
Taxon
Sources of New Terms in
OntologyEntry

MGED Ontology


Shared Resources (SRes)


Continued development of new classes
and terms
Contains controlled vocabularies and
ontologies
External Database Sources

Annotated term provided by user
Adding New Terms
1 Add term from SRes
2 Add term from External Database
Future Issues

Burning Issues




Developing MO in synch with related efforts
(MAGE-OM v.2.0)
Use/presentation in annotation forms
Coverage of other technologies and biological
domains
Flame retardant structure

ExtendedOntology

Space to add new classes, terms and their relationship to
one another
A Functional Genomics View
A. Jones et al. submitted
A Functional Genomics Object
Model (FGE-OM)



Separate out common
components from
technology-specific ones
Allow new domains to be
added as new modules to
the model
Incorporate ideas from
SysBio-OM (Xirasgur et
al. Bioinformatics in
press)
Jones et al. Bioinformatics in press
Proposed Development of FGE-OM
Informal specification
Formal specification
Microarray
Standards
Proteomics
Standards
Functional
Genomics
Standards
MIAME
MAGE-OM
MIAPE
Pedro
Pedro
MIAPE-OM
FGE-OM
MGED Ontology
MIAME
MIAME-Tox
MIAPE
Use Cases
FGE-OM
MGED Ontology
Strong type system
Immutable type system
Acknowledgements

MGED Ontology Working Group
Chris Stoeckert, Trish Whetzel (Penn)
 Helen Parkinson (EBI)
 Joe White (TIGR)
 Gilberto Fragoso, Liju Fan, Mervi
Heiskanen (NCI)
 Many others!
