2005-06_AnnotCamp_IntroGO_panel1

Download Report

Transcript 2005-06_AnnotCamp_IntroGO_panel1

The Gene Ontologies
A Common Language for Annotation of
Genes from
Yeast, Flies and Mice
…and Plants and Worms
…and Humans
…and anything else!
Gene Ontology Objectives
• GO represents concepts used to classify specific
parts of our biological knowledge:
– Biological Process
– Molecular Function
– Cellular Component
• GO develops a common language applicable to
any organism
• GO terms can be used to annotate gene products
from any species, allowing comparison of
information across species
Expansion of Sequence Info
Entering the
Genome Sequencing Era
Eukaryotic Genome Sequences Year
Genome
Size (Mb)
# Genes
Yeast (S. cerevisiae)
1996
12
6,000
Worm (C. elegans)
1998
97
19,100
Fly (D. melanogaster)
2000
120
13,600
Plant (A. thaliana)
2001
125
25,500
Human (H. sapiens, 1st Draft)
2001
~3000
~35,000
Baldauf et al. (2000)
Science 290:972
Comparison of sequences
from 4 organisms
MCM3
MCM2
CDC46/MCM5
CDC47/MCM7
CDC54/MCM4
MCM6
These proteins form a hexamer in the species that have been examined
http://www.geneontology.org/
Outline of Topics
• Introduction to the Gene Ontologies (GO)
• Annotations to GO terms
• GO Tools
• Applications of GO
What is an Ontology? (from OED)
1721 BAILEY, Ontology, an Account of being in the Abstract.
1733 (title) A Brief Scheme of Ontology or the Science of
Being in General. a1832 BENTHAM Fragm. Ontol. Wks. 1843
VIII. 195 The field of ontology, or as it may otherwise be
termed, the field of supremely abstract entities, is a yet
untrodden labyrinth. 1884 BOSANQUET tr. Lotze's Metaph. 22
Ontology..as a doctrine of the being and relations of all
reality, had precedence given to it over Cosmology and
Psychology, the two branches of enquiry which follow the
reality into its opposite distinctive forms.
Sriniga Srinivasan, Chief Ontologist, Yahoo!
The ontology. Dividing human knowledge
into a clean set of categories is a lot like
trying to figure out where to find that
suspenseful black comedy at your corner
video store. Questions inevitably come up,
like are Movies part of Art or
Entertainment? (Yahoo! lists them under the
latter.) -Wired Magazine, May 1996
The 3 Gene Ontologies
• Molecular Function = elemental activity/task
– the tasks performed by individual gene products; examples are carbohydrate
binding and ATPase activity
• Biological Process = biological goal or objective
– broad biological goals, such as mitosis or purine metabolism, that are accomplished
by ordered assemblies of molecular functions
• Cellular Component = location or complex
– subcellular structures, locations, and macromolecular complexes; examples include
nucleus, telomere, and RNA polymerase II holoenzyme
Example:
Gene Product = hammer
Function (what)
Process (why)
Drive nail (into wood)
Carpentry
Drive stake (into soil)
Gardening
Smash roach
Pest Control
Clown’s juggling object
Entertainment
Biological Examples
Biological Process
Molecular Function
Cellular Component
Terms, Definitions, IDs
term: MAPKKK cascade (mating sensu Saccharomyces)
goid: GO:0007244
definition: OBSOLETE. MAPKKK cascade involved in
definition: MAPKKK cascade involved in transduction of
transduction of mating pheromone signal, as described in
mating pheromone signal, as described in Saccharomyces
Saccharomyces.
definition_reference: PMID:9561267
comment: This term was made obsolete because it is a gene
product specific term. To update annotations, use the biological
process term 'signal transduction during conjugation with cellular
fusion ; GO:0000750'.
Directed Cyclic Graph
Figure 4.1. Life cycles of heterothallic and homothallic strains of S. cerevisiae. Heterothallic strains can be
stably maintained as diploids and haploids, whereas homothallic strains are stable only as diploids,
because the transient haploid cells switch their mating type, and mate.
An Introduction to the Genetics and Molecular Biology of the Yeast Saccharomyces cerevisiae Fred Sherman 2000;
Modified from: F. Sherman, Yeast genetics. The Encyclopedia of Molecular Biology and Molecular Medicine, pp. 302-325, Vol. 6. Edited by R.
A. Meyers, VCH Pub., Weinheim, Germany,1997.
Parent-Child Relationships
Nucleus
Nucleoplasm
A child is a subset of
a parent’s elements
Nuclear
envelope
Nucleolus Chromosome Perinuclear space
The cell component term
Nucleus has 5 children
“Tree” Relationships
Derivation of Romance languages from Latin.
From R.A. Hall Jr., Introductory Linguistics; originally published by Chilton Books,
now distributed by Rand McNally & Co.
Ontology Relationships
Directed Acyclic Graph
http://www.ebi.ac.uk/ego
Evidence Codes for GO
Annotations
http://www.geneontology.org/doc/GO.evidence.html
IEA
ISS
IEP
IMP
IGI
IPI
IDA
RCA
TAS
NAS
IC
ND
Inferred from Electronic Annotation
Inferred from Sequence Similarity
Inferred from Expression Pattern
Inferred from Mutant Phenotype
Inferred from Genetic Interaction
Inferred from Physical Interaction
Inferred from Direct Assay
Inferred from Reviewed Computational Analysis
Traceable Author Statement
Non-traceable Author Statement
Inferred by Curator
No biological Data available
IEA
Inferred from Electronic Annotation
• Sequence Similarity (BLAST)
• Automatic transfer from mappings
(InterPro2GO, EC2GO etc.)
-> Not manually reviewed
ISS
Inferred from Sequence or Structural
Similarity
• Sequence similarity
• Recognized domains
• Structural similarity
-> Use of ‘with’ column recommended
IEP
Inferred from Expression Pattern
• Transcript levels (Northerns, microarrays)
• Protein levels (Western blots)
-> Timing or localization of expression
-> Biological process annotations
IMP
Inferred from Mutant Phenotype
• Gene mutation/knockout
• Overexpression/ectopic expression
• Anti-sense experiments
• RNAi experiments
• Specific protein inhibitors
IGI
Inferred from Genetic Interaction
• Suppressors, synthetic lethals…
• Functional complementation
• Rescue experiments
-> Use of ‘with’ column recommended
IPI
Inferred from Physical Interaction
• 2-hybrid interactions
• Co-purification
• Co-immunoprecipitation
• Ion/complex/protein binding experiments
-> Use of ‘with’ column recommended
IDA
Inferred from Direct Assay
• Enzyme assays
• In vitro reconstitution (e.g. transcription)
• Immunofluorescence (for cell. comp.)
• Cell fractionation (for cell. comp.)
• Physical interaction/binding assay
RCA
Inferred from Reviewed Computational
Analysis
• Non-sequence-based computational methods
• Genome-wide analyses (e.g. 2-hybrid)
• Combinations of large-scale experiments
TAS
Traceable Author Statement
• Support from review article
• Textbook ‘common knowledge’
-> Data that can be ‘traced’ back
NAS
Non-traceable Author Statement
• Database entries that don't cite a paper
-> Data that cannot be ‘traced’ back
IC
Inferred by Curator
• Not supported by any direct evidence
• Inferred from other GO annotations
-> GO term in ‘with/from’ column required
ND
No biological Data available
Curator found no information supporting any annotation
• molecular function unknown GO:0005554
• biological process unknown GO:0000004
• cellular component unknown GO:0008372
Term Hierarchy
TAS/IDA
IMP/IGI/IPI
ISS/IEP
NAS
IEA
Qualifiers
The qualifier modifies the interpretation of a GO term
NOT: explicit note that a gene product is not
associated with a GO term
colocalizes_with: only transient localization,
or low resolution of an assay
contributes_to: gene product that is part of a complex
can be annotated to the process/function of the complex
http://www.geneontology.org/GO.annotation.shtml#qual
http://www.geneontology.org/doc/GO.evidence.html