Mouse Genome Informatics

Download Report

Transcript Mouse Genome Informatics

Gene Ontology
Overview and Perspective
Lung Development Ontology Workshop
A biological ontology is:



A (machine) interpretable representation of some
aspect of biological reality
what kinds of
things exist?
what are the
Optic placode
develops
from
sense organ
is_a
eye
relationships
between these
things?
part_of
sclera
http://www.macula.org/anatomy/eyeframe.html
2
Gene Ontology (GO) Consortium



www.geneontology.org
Formed to develop a shared language
adequate for the annotation of molecular
characteristics across organisms; a common language to share
knowledge.
Seeks to achieve a mutual understanding of the definition and
meaning of any word used; thus we are able to support crossdatabase queries.
Members agree to contribute gene product annotations and
associated sequences to GO database; thus facilitating data
analysis and semantic interoperability.
3
Gene Ontology widely adopted
AgBase
4
GO represents three biological domains

Molecular Function = elemental activity/task


Biological Process = biological goal or
objective


the tasks performed by individual gene products; examples are
carbohydrate binding and ATPase activity
broad biological goals, such as mitosis or purine metabolism, that are
accomplished by ordered assemblies of molecular functions
Cellular Component = location or complex

subcellular structures, locations, and macromolecular complexes;
examples include nucleus, telomere, and RNA polymerase II
holoenzyme
5
Terms are defined graphically relative to other terms
The Gene Ontology (GO)

1.

2.

3.

4.
Build
Buildand
andmaintain
maintainlogically
logicallyrigorous
rigorousand
and
biologically
biologicallyaccurate
accurateontologies
ontologies
Comprehensively
Comprehensivelyannotate
annotatereference
referencegenomes
genomes
Support
Supportgenome
genomeannotation
annotationprojects
projectsfor
forall
all
organisms
organisms
Freely
Freelyprovide
provideontologies,
ontologies,annotations
annotationsand
andtools
to
the to
research
community
tools
the research
community
7
Building the ontologies



The GO is still developing daily both in ontological structures and in
domain knowledge
Ontology development workshops focus on specific domains
needing revision and bring together ontology developers and
domain experts
Currently running ~2 workshops / year
1.
2.
3.
4.
5.
6.
Metabolism and cell cycle (Aug, 2004)
Immunology and defense response (Nov05, Apr06)
Early CNS development (June, 2006)
Peripheral nervous system development (Feb, 2007)
Blood Pressure Regulation (June, 2007)
Muscle Development (July, 2007)
8
Building the ontology: Immune System Process
725 new terms related to immunology
Red part_of
Blue is_a
127 new terms added to cell type ontology
Alex Diehl
9
Annotating Gene Products using GO
P05147
PMID: 2976880
Gene Product
P05147
Reference
GO:0047519
IDA
PMID:2976880
IDA
GO:0047519
GO Term
Evidence
10
Annotations are assertions



There is evidence that this gene product can
be best classified using this term
The source of the evidence and other
information is included
There is agreement on the meaning of the
term
11
Annotations are assertions
Annotations are the connections between genomic
information and the GO.
Experiments provide the data that enables us to
annotate gene products with terms from the ontologies.
Annotations for App: amyloid beta (A4) precursor protein
12
We use evidence codes to describe the
basis of the annotation












IDA: Inferred from direct assay
IPI: Inferred from physical interaction
IMP: Inferred from mutant phenotype
IGI: Inferred from genetic interaction
IEP: Inferred from expression pattern
IEA: Inferred from electronic annotation
ISS: Inferred from sequence or structural
similarity
TAS: Traceable author statement
NAS: Non-traceable author statement
IC: Inferred by curator
RCA: Reviewed Computational Analysis
ND: no data available
Direct Experiment in organism
NO Direct Experiment
Inferred from evidence
13
GO Annotation Stats:
GO Annotations
Total manual GO annotations - 388,633
Total proteins with manual annotations – 80,402
Contributing Groups (including MGI): - 19
Total Pub Med References – 346,002
Total number predicted annotations – 17,029,553
I
Total number taxa – 129,318
Total number distinct proteins – 2,971,374
April 24, 2007
14
Annotations of gene products to GO are genome specific
Now we can query across all annotations based on shared biological activity.
15
GO is a functional annotation system of great
utility to the data-driven biologist
16
GO enables genomic data analysis



Microarrays allow biologists to
record changes in gene function
across entire genomes
Result: Vast amounts of gene
expression data desperately
needing cataloging and tagging
Many data analysis tools use GO
graph structure to statistically
evaluate clusters of co-expressed
genes based on shared functional
annotations


680 pub (of 1517) on GO list
46 microarray tools contributed
17
GO supports functional classifications
OCT 13, 2006
Cancer Genome Projects
18
GO is wildly successful
Nature: January 2007
FIGURE 3. Representative cell-type-specific genes and corresponding molecular functions.
19
Comprehensively annotate Reference Genomes

Human
Mouse
Fly
Rat
Chicken
Zebrafish
Worm
Dicty

E.coli










Saccharomyces cerevisiae
Schizosaccharomyces pombe
Arabidopsis thaliana
20
Reference Genome Annotation Project



Priority genes: those
implicated in human
diseases
Determine
orthologs/homologs in
reference genomes
For these genes,
comprehensively curate
biomedical literature
Mary Dolan
21
Reference Genome Development Projects




Shared annotation focus = Coordinated attention to ontology
structure
Orthology/homology set across primary model organisms
Reference ID mappings including associations of sequences,
gene/proteins, and human diseases
Ultimately, transparent access to comprehensive information about
genes among the primary data providers
22
Ongoing Challenges for the GO Consortium
1. Verifying and maintaining domain representations in the ontology
that reflect best knowledge of the real world.
- Depends on the involvement of biologists (domain experts)
- Difficult to automate
- Must accommodate continuing changes in what we think we
understand about biological systems
2. Providing comprehensive annotations, where experimental evidence
is available, for all genes
- Dependant on the quality of annotations from experimental literature
- Combines manual curation by highly-trained scientists supplemented by
computational inference prediction annotations
- Comprehensiveness may depend on changes in biomedical publishing
23
acknowledgements
MGI
Carol Bult
Janan Eppig
Jim Kadin
Joel Richardson
Martin Ringwald
Lois Maltais
TBK Reddy
Monica McAndrews-Hill
Nancy Butler
GO
Michael Ashburner (Cambridge)
J. Michael Cherry (Stanford)
Suzanna Lewis (LBNL)
Rex Chisholm (NWU)
David Hill (Jackson Lab)
Midori Harris (EBI)
Chris Mungall (LBNL)
Jane Lomax (EBI)
Eurie Hong (Stanford)
Jen Clark (EBI)
GO @ MGI
Alex Diehl
Mary Dolan
Harold Drabkin
David Hill
Li Ni
Dmitry Sitnikov
24
Gene Ontology
www.geneontology.org
Mouse Genome Informatics
www.informatics.jax.org
GO Consortium is supported by
NIH-NHGRI and by the
European Union RTD
Programme
MGI projects are supported by
NIH [NHGRI, NICH, and NCI].
PRO is supported by NIGMS
Corpora is supported by NLM
Bar Harbor, Maine, USA
25