MGED Ontology Working Group Report

Download Report

Transcript MGED Ontology Working Group Report

MGED Ontology:
An Ontology of Biomaterial
Descriptions for Microarrays
Microarray Data Analysis and Management:
Bio-ontologies for Microarrays
EMBL-EBI, Hinxton, Cambridge, UK
Dec. 5, 2001
Chris Stoeckert, U. Penn
Ontology Usage for Genes in
EpoDB
• EpoDB is a prototype system of genes
expressed during erythropoiesis
• Built before microarrays were readily
available
• Illustrate usage of an ontology of gene parts
and controlled vocabularies of gene (and
gene family) names
EpoDB “Gene Ontology”
http://www.cbil.upenn.edu/EpoDB
Stoeckert, Salas, Brunk, Overton (1999) Nucl. Acids Res. 26:288
EpoDB Gene Landmark Query
What is an ontology?
(In the computer science not philosophy sense)
• An ontology is a specification of concepts
that includes the relationships between
those concepts.
• Removes ambiguity. Provides semantics
and constraints.
• Allows for computational inferences and
reliable comparisons
Types of Ontologies
• Taxonomy
– Tree structure. IS-A hierachy
– Variants - Gene Ontology (DAG)
• Frame-based (object-oriented)
– Classes and attributes
– EcoCyc
• Description logic (DL)
– Reasoning about concept (class) relationships
– Combine terms with constraints (sanctioning)
– GRAIL (GALEN, TAMBIS)
• Ontology Inference Layer (OIL)
– Combines Frames and DLs
– Uses Web standards XML and RDF
Taxonomy
• Terms for common usage
– Homo sapiens, not human, not homo sapeins
– NCBI ID = 9606
• Hierarchy provides unambiguous levels of
equivalence
– Homo sapiens and Mus musculus are of the class
Mammalia but Drosophila melanogaster is not.
• Can use taxonomic hierarchies for other types of
information
– e.g., Human Developmental Anatomy (U. of Edinburgh)
Microarray Information to be Captured
Figure from:
David J. Duggan et al. (1999) Expression Profiling using cDNA microarrays. Nature Genetics 21: 10-14
Tables Describing Samples in RAD
(RNA Abundance Database)
Devel. Stage
Disease
Sample
Label
Taxon
Anatomy
ExperimentSample
Experiment
Exp.ControlGenes
Treatment
Hybridization
Conditions
ControlGenes
Groups
ExpGroups
RelExperiments
CBIL Anatomy Hierarchy
Anatomy Table Used by RAD
Usage of Anatomy Hierarchy to Query RAD
Standardisation of Microarray Data
and Annotations -MGED Group
The MGED group is a grass roots movement initially
established at the Microarray Gene Expression
Database meeting MGED 1 (14-15 November, 1999,
Cambridge, UK). The goal of the group is to
facilitate the adoption of standards for DNA-array
experiment annotation and data representation, as
well as the introduction of standard experimental
controls and data normalisation methods. Members
are from around the world in academia, government,
and industry.
http://www.mged.org
MGED Working Groups
• Annotation: Experiment description and data
representation standards (Alvis Brazma, EMBLEBI)
• Format: Microarray data XML exchange format
(Paul Spellman, UC Berkeley)
• Ontology: Ontologies for sample description
(Chris Stoeckert, U Penn)
• Normalization: Normalization, quality control and
cross-platform comparison (Gavin Sherlock,
Stanford U)
MGED Documents
• Annotation -> Minimal Information About a
Microarray Experiment (MIAME)
– What should go into a microarray database
– Brazma et al. Nature Genetics 29:365-371, 2001
• Format -> Microarray Gene Expression
(MAGE) Object Model and XML DTD
– How microarray databases will talk to each other
Relationship of MGED Efforts
Annotation
Format
Ontologies

External

Internal
MIAME
DB
MAGE
MGED Ontology
External
Ontologies/CVs
MIAME
DB
Ontologies provide common terms and their definitions for
describing microarray experiments.
MGED Ontology Working Group Goals
1. Identify concepts
2. Collect available controlled vocabularies
and ontologies for concepts
3. Define concepts
4. Formalize concept relationships
http://www.cbil.upenn.edu/Ontology/
Species
Resources
Concept
Definitions
MGED Ontology Working Group Goals
1. Identify concepts
2. Collect available controlled vocabularies
and ontologies for concepts
3. Define concepts
4. Formalize concept relationships
Usage of Concepts and Resources for
Microarrays
• MIAME glossary
– Provide definitions for types of information
(concepts) listed in MIAME
• MIAME qualifier, value, source
– Provide pointers to relevant sources that can be
used to
MIAME Section on Sample Source and Treatment
sample source and treatment ID as used in section 1
organism (NCBI taxonomy)
additional "qualifier, value, source" list; the list includes:
cell source and type (if derived from primary sources (s))
sex
age
growth conditions
development stage
organism part (tissue)
animal/plant strain or line
genetic variation (e.g., gene knockout, transgenic variation)
individual
individual genetic characteristics (e.g., disease alleles, polymorphisms)
disease state or normal
target cell type
cell line and source (if applicable)
in vivo treatments (organism or individual treatments)
in vitro treatments (cell culture conditions)
treatment type (e.g., small molecule, heat shock, cold shock, food deprivation)
compound
is additional clinical information available (link)
separation technique (e.g., none, trimming, microdissection, FACS)
laboratory protocol for sample treatment
Excerpts from a Sample Description
courtesy of M. Hoffman, S. Schmidtke, Lion BioSciences
Organism: mus musculus [ NCBI taxonomy browser ]
Cell source: in-house bred mice (contact: [email protected])
Sex: female [ MGED ]
Age: 3 - 4 weeks after birth [ MGED ]
Growth conditions: normal
controlled environment
20 - 22 oC average temperature
housed in cages according to German and EU legislation
specified pathogen free conditions (SPF)
14 hours light cycle
10 hours dark cycle
Developmental stage: stage 28 (juvenile (young) mice) [ GXD "Mouse Anatomical Dictionary" ]
Organism part: thymus [ GXD "Mouse Anatomical Dictionary" ]
Strain or line: C57BL/6 [International Committee on Standardized Genetic Nomenclature for Mice]
Genetic Variation: Inbr (J) 150. Origin: substrains 6 and 10 were separated prior to 1937. This
substrain is now probably the most widely used of all inbred strains. Substrain 6 and 10 differ at the H9,
Igh2 and Lv loci. Maint. by J,N, Ola. [International Committee on Standardized Genetic Nomenclature
for Mice ]
Treatment: in vivo [MGED] intraperitoneal injection of Dexamethasone into mice, 10 microgram per
25 g bodyweight of the mouse
Compound: drug [MGED] synthetic glucocorticoid Dexamethasone, dissolved in PBS
MGED Ontology Working Group Goals
1. Identify concepts
2. Collect available controlled vocabularies
and ontologies for concepts
3. Define concepts
4. Formalize concept relationships
MGED Biomaterial Ontology
• Under construction
– Using OILed (Not wedded to any one tool)
– Generate multiple formats: RDFS, DAML+OIL
• Define classes, provide relations and
constraints, identify instances
• Motivated by MIAME and coordinated with
MAGE
MAGE BioMaterial Model
Building a Microarray Ontology
http://www.cbil.upenn.edu/Ontology/Build_Ontology2.html
Ontology Available as RDFS
Ontology in Browseable Form
Example of Internal Terms
Example of External Terms
Example of Combined Internal
and External: Treatment
OWG Use Cases
• Return a summary of all experiments that use a specified
type of biosource.
– Use “age” to select and order experiments
– Use Mouse Anatomical Dictionary Stage 28 to pick experiments
according to “organism part”
• Return a summary of all experiments done examining
effects of a specified treatment
– E.g., Look for “CompoundBasedTreatment”, “in vivo”
– Select “Compound” based on CAS registry number
– Order based on “CompoundMeasurement”
• Build gene networks based on biomaterial description
– Generate a distance metric based on biosource and use in
calculation of correlation with gene expression level
– Generate an error estimation based on biosample (i.e., even when
biosources are identical, there will be variation resulting from
different treatments)
Ontology Working Group Highlights
• First pass ontology of biomaterial
descriptions
• Participated in Bio-ontologies Consortium
Meeting at ISMB 2001.
• Mail list of about 200 subscribers
Ontology Working Group Plans
• Finish building biomaterial description
ontology
• Expand efforts to include remaining parts of
a microarray experiment
• Demonstrate usage to the microarray
community
Acknowledgements
• Past and present members of CBIL for their
work on EpoDB and RAD
• The members of the MGED Ontology Working
Group for their contributions
• The Bio-Ontologies Consortium for
encouragement and guidance
• This presentation is available at
http://www.cbil.upenn.edu/Ontology/MGEDOntology1201.ppt