BioOntologies2007_jb.. - Bio

Download Report

Transcript BioOntologies2007_jb.. - Bio

Gene Ontology Annotations:
What they mean and where they come from
Judith A. Blake, David P. Hill, Barry Smith
BioOntologies SIG: Vienna
July 20, 2007
GO Consortium Project Goals
1. We will maintain comprehensive, logically
rigorous and biologically accurate ontologies.
*2. We will comprehensively annotate reference
genomes in as complete detail as possible.
*3. We will support annotation across all organisms.
4. We will provide our annotations and tools to the
research community.
GO terms are used for functional annotations
I Brain development [GO:0007420] (141 genes, 207 annotations)
I
GO Stats:
GO Annotations
Total experimental GO annotations - 388,633
Total proteins with manual annotations – 80,402
Contributing Groups (including MGI): - 19
Total Pub Med References – 346,002
Total number predicted annotations – 17,029,553
I
Total number taxa – 129,318
Total number distinct proteins – 2,971,374
April 24, 2007
Annotations are assertions
Annotations provide the connection between genomic
information and the GO.
Experiments provide the data that enables us to
annotate gene products with terms from the ontologies.
Annotations for App: amyloid beta (A4) precursor protein
We use evidence codes to describe the
basis of the annotation












IDA: Inferred from direct assay
IPI: Inferred from physical interaction
IMP: Inferred from mutant phenotype
IGI: Inferred from genetic interaction
IEP: Inferred from expression pattern
ISS: Inferred from sequence or structural similarity
TAS: Traceable author statement
NAS:Non-traceable author statement
IC: Inferred by curator
RCA: Reviewed Computational Analysis
IEA: Inferred from electronic annotation
ND: no data available
Direct Experiment
NO Direct Experiment
Examples of how we connect instances with
knowledge representation in the GO
What follows are examples of annotation of
the biomedical literature using GO types,
gene product types and evidence codes
Example #1:Molecular Function using IDA
Figure from
Zhang M, Chen W, Smith SM, Napoli JL.
Molecular characterization of a mouse short chain dehydrogenase/reductase active
with all-trans-retinol in intact cells, mRDH1.
J Biol Chem. 2001 Nov 23;276(47):44083-90.
The Observation
NAD+
NADH
H+
The Annotation:
What are the instances in this experiment?

Gene product instances


Molecules of retinol dehydrogenase
Molecular function instances


Instances of execution of the molecular function revealed
by the assay
Instances of molecular function associated with instances
of retinol dehydrogenase. These instances are the potential
of a molecule of retinol dehydrogenase to execute the
function retinol dehydrogenase activity.
What knowledge are we trying to capture?
We are interested in understanding how gene
products contribute to the biology of an
organism.
How do wet-bench biologists learn about gene products?
They do experiments!
Experiments are designed to study the properties of gene
product instances.
Experimental biologists take on “The Burden of Proof”.
How do we represent the accumulated knowledge?
We* make annotations!
******
Annotations connect what wet-bench biologists see in
the lab with how we represent our current
understanding of biological reality
* GO curators
So, where are the instances?
The instances are in the lab. We use what
people report about instances, but we never
actually deal with them directly
What do we mean by gene product?

Gene Product Type

Stands proxy for the ‘gene’



Genes are what we have in MODs
Types = what instances have in common
Gene Product Instance

A molecule of a gene product


It can be physically isolated
It takes up space
What do we mean by annotations?

An annotation



Asserts that instances of molecules of a type of gene
product have propensity to act as designated by the
terms in an ontology such as the GO
Is created on the basis of observations of the instances
of such types in experiments and of the inferences
drawn from such observations
Note: comprehensive experimental details are
embedded in biomedical publications and in specialized
databases
Example #2: Molecular Function using IMP
Figure from
Schulz S, Lopez MJ, Kuhn M, Garbers DL.
Disruption of the guanylyl cyclase-C gene leads to a paradoxical phenotype of viable but
heat-stable enterotoxin-resistant mice.
J Clin Invest. 1997 Sep 15;100(6):1590-5.
The Observation
X
X
The Annotation:
IMP
What are the instances in this experiment?

Gene product instances



Molecules of GUCY2C protein
The lack of functional molecules of GUCY2C in mutants
Molecular function instances


The execution of the molecular function, measured by
the accumulation of cGMP
The potential of a molecule of GUCY2C to execute the
molecular function

Revealed by the correlation between a lack of molecules
and a lack of executions of molecular function
The Curator Perspective: Annotation Process
1.
Identification of relevant experimental data
- Biomedical literature as primary source
- Annotations inferred from experiments in
performed in other organisms or inferred
from sequence structure
The Curator Perspective: Annotation Process
1.
2.
Identification of relevant experimental data
Identification of the appropriate ontology
annotation term
- Experimental assay influences limit of
resolution/granularityof term assignment
available to use
- Differences in expertise among curators
should result in close, but not necessarily exact,
GO term annotations
The Curator Perspective: Annotation Process
1.
1.
2.
Identification of relevant experimental data
Identification of the appropriate ontology
annotation term
Employment of annotation quality control
processes for
- Correct formal structure
- Evaluate annotation consistency
- Harvest emerging knowledge to refine and
extend the GO
Example #3: Biological Process Using IMP
Washington Smoak I; Byrd NA; Abu-Issa R; Goddeeris MM; Anderson R; Morris J; Yamamura K; Klingensmith J;
Meyers EN, Sonic hedgehog is required for cardiac outflow tract and neural crest cell development.,
Dev Biol 2005 Jul 15;283(2):357-72.
The Observation
X
The Annotation:
IMP
What are the instances in this Experiment?

Gene product instances
 Molecules of the Shh gene


Biological Process instances


Non-functional molecules of the Shh gene
The development of a mouse heart
Molecular Function Instances

The execution of a molecular function by a
molecule of the Shh gene
So, when a biological process occurs,
it is the result of molecules
of a gene product(s) executing
their molecular function(s)
How do wet-bench biologists learn about gene
products?
They do experiments!
Experiments are designed to study the properties of gene
product instances.
Experimental biologists take on “The Burden of Proof”.
They make conclusions
about gene product types
based on the accumulated
experimental data!
If experiments show:



All instances of a gene product studied have the
potential to execute the function tyrosine kinase
Instances of the same gene product are involved
in the biological process limb development
All instances of the same gene product are
found in instances of the cytoplasm
A wet-bench biologist would conclude:
The gene product of this gene is a
tyrosine kinase that functions in the
cytoplasm and the tyrosine kinase
functioning is used in limb
development
If we comprehensively annotate genes, can
we make the same conclusions?
This is the basis of biological discovery!
Analysis of gene product annotations lead to
new hypothesis for wet-bench biologists to
test
Development of GO depends on intersection
of curation with ontology refinements



Process of annotation brings new experimental
results into perspective with existing scientific
knowledge captured in the ontology
New results may stand in conflict with current
version of ontology
One of strengths of GO development paradigm is
that it is primarily a task of biologist-curators who
are experts in understanding the experimental
systems
Hypothesis
generation
Data mining,
and prediction
using ontologies
Experiments and
data analysis
using GO, etc
Experimental
Literature
Informatics
Resources
Improved annotations,
in MODs, UniProt;
Refine bio-ontologies
Summary




Gene product annotation is an integral aspect of the work
of the GO Consortium
Annotations reflect conclusions from experiments as
interpreted by the biologist and reviewed by peers
The structure of the GO depends upon accumulated
knowledge from many experiments resulting in a
representation of current thought about biological reality
As experimental data changes our view of reality, the
ontology must change as well