Rocca-Serra presentation

Download Report

Transcript Rocca-Serra presentation

MGED ontology for consistent
annotation of microarray experiments
Manchester Bioinformatics Week
Ontologies Workshop1
March 23-24th 2002
Philippe Rocca-Serra
Microarray Informatics Team
EBI-EMBL, Hinxton Cambridge
The European Bioinformatics Institute
ArrayExpress: a database for
Gene Expression Studies
Samples
Gene expression
data matrix
Genes
The European Bioinformatics Institute
ArrayExpress goals

To create a public repository for gene expression data:
 apply a standard format
 apply curation to the data (high quality control)
 easy access to information
 search and retrieve information

To compare experiments.

To perform analysis and data mining using complex querying
The European Bioinformatics Institute
What kind of data should be stored ?
Samples
annotations
Experiment (platform,
Genes &
transcription
units
conditions…)
Gene expression
data matrix
The European Bioinformatics Institute
Important issues about data annotation
 Sufficient annotation of the experiment, genes and samples
 Efficient annotation:
•Machine processable: effective mining agents
•Homogenous: consistent annotation
•Unambiguous: accurate description, sample
discrimination.
The European Bioinformatics Institute
MIAME Requirements:
addressing the issue of sufficient annotation
Recorded info should be sufficient to interpret and
replicate the experiment

Experimental design: the set of hybridisation experiments as a
whole
 Array design: each array used and each element (spot) on the
array
 Samples: samples used, extract preparation and labelling
 Hybridisations: procedures and parameters
 Measurements: images, quantitation, specifications
 Normalisation controls: types, values, specifications
(Brazma et al, Nature Genetics, 2001)
The European Bioinformatics Institute
Second Challenge
Addressing the issue of annotation efficiency
One of the main MGED Goal to facilitate the adoption of
standards for DNA-array experiment annotation and data
representation
 requires machine understandable annotations:
– Avoid free text and natural language:
– Avoid synonyms: adrenaline / epinephrine
– General use of CV and Ontologies
 Gene annotation using e.g. GO and pathway analysis
 Create a new ontology where necessary:
– Task assigned to MGED for Biomaterial (sample)
description
The European Bioinformatics Institute
Ontology integration in the object model
describing ArrayExpress database
 ArrayExpress DB is an implementation of the MAGE-OM
model (a UML model)
 MAGE model by construction includes the use of ontology
entries :
-37 locations for an “Ontology Entry”
-36 cases of simple Controlled Vocabularies: e.g. Image
Format (TIFF, JPEG)
-1 has required development of specific modelling:
Biomaterial (sample) description
The European Bioinformatics Institute
MAGE BioMaterial Model
The European Bioinformatics Institute
Facts about MGED biomaterial ontology
Authors:
Developed by Chris Stoeckert, U. Penn and Helen Parkinson, EBI
Coordinated with the ArrayExpress database model (mapping available)
 Technical choices: Use of the OIL Language
–A new standard for building ontologies provides support for Formal
Semantics and Reasoning:
–Class/property modelling primitives based on Frame based systems:
–Semantics Capturing based on Description Logics:
–Syntax for encoding primitives and semantics based on existing Web
languages: XML
Availability: http://mged.sourceforge.net/Ontologies.shtml
The European Bioinformatics Institute
MGED ontology:features & complexity
 Facts about the ontology:
– 75 classes
– 70 slots
– 98 individuals
– more individuals to
be added
The European Bioinformatics Institute
Using MGED Ontology: a Browseable Form
The European Bioinformatics Institute
MGED defined concepts: internal terms
The European Bioinformatics Institute
Linking to external ontologies: an application
The European Bioinformatics Institute
External References
MGED Ontology
Instances
©-BioMaterialDescription
©-Biosource Property
©-Organism
NCBI Taxonomy
Mus musculus musculus id: 39442
7 weeks after birth
©-Age
Mouse Anatomical Dictionary
©-DevelopmentStage
Stage 28
Female
©-Sex
International Committee on Standardized
Genetic Nomenclature for Mice
©-StrainOrLine
C57BL/6
Charles River, Japan
©-BiosourceProvider
©-OrganismPart
Mouse Anatomical Dictionary
Liver
©-BioMaterialManipulation
©-EnvironmentalHistory
©-CultureCondition
©-Temperature
22  2C
©-Humidity
55  5%
©-Light
12 hours light/dark cycle
©-PathogenTests
Specified pathogen free conditions
©-Water
ad libitum
©-Nutrients
MF, Oriental Yeast, Tokyo, Japan
©-Treatment
©-CompoundBasedTreatment
ChemIDplus
(Compound)
(Treatment_application)
(Measurement)
Fenofibrate, CAS 49562-28-9
in vivo, oral gavage
100mg/kg body weight
The European Bioinformatics Institute
Referencing to external ontologies









NCBI taxonomy database
Jackson Lab mouse strains and genes
Edinburgh mouse atlas anatomy
GO Gene Ontology
HUGO nomenclature for Human genes
Chemical and compound Ontologies - Merck index
TAIR
Flybase
…..and many more…www.mged.org/ontology/
The European Bioinformatics Institute
Planning MGED ontology’s future
 Making the ontology available where it’s needed:
 Develop browser or other interface for the ontology and link to
LIMS
 Incorporate the ontology into submission/annotation and
curation tools (MIAMExpress)
The European Bioinformatics Institute
Planning MGED ontology’s future
External Ontologies
Other submitters
Submission via
MIAMExpress
Large centres LIMS
MGED/ArrayExpress
ontology
Direct Submission
in Mage-ML
Ontology availability made simple ?
Curation DB
ArrayExpress DB
The European Bioinformatics Institute
Planning MGED ontology’s future
 Making the ontology available where it’s needed:
 Develop browser or other interface for the ontology and link to
LIMS
 Incorporate the ontology into submission/annotation and
curation tools (MIAMExpress)
Further ontology development : new instances, class refinement
Better integration of available ontologies
Writing guidelines on how to use ontologies for annotating data:
Developing Use cases (non trivial task)
The European Bioinformatics Institute
Resources




List of ontology resources from MGED pages
MAGE-MIAME-ontology mappings, MIAME glossary
Schemas for both ArrayExpress and MIAMExpress
Annotation examples in MAGE-ML
URL:
www.mged.org ¦
www.ebi.ac.uk/microarray
mailing lists:
[email protected]
[email protected]
The European Bioinformatics Institute
Acknowledgements
EBI-EMBL:
University of Pennsylvania:
H. Parkinson
C. Stoeckert
S. Sansone
E. Holloway
A. Brazma
And the Microarray Informatics Team.
The European Bioinformatics Institute