GO: The Gene Ontology

Download Report

Transcript GO: The Gene Ontology

Lecture Four:
GO: The Gene Ontology
----Infrastructure for Systems
Biology
S. cerevisiae
D. melanogaster
C elegans
Cells that normally survive
CED-9
ON
CED-3
CED-4
OFF
Cells that normally die
CED-9
OFF
CED-3
CED-4
ON
M. musculus
Comparison of sequences
from 4 organisms
MCM3
MCM2
CDC46/MCM5
CDC47/MCM7
CDC54/MCM4
MCM6
These proteins form a hexamer in the species that have been examined
The Gene Ontologies
A Common Language for Annotation of
Genes from
Yeast, Flies and Mice
…and Plants and Worms
…and Humans
…and anything else!
Gene Ontology - 1998
FlyBase
Drosophila
Cambridge, EBI, Harvard
Berkeley & Bloomington.
SGD
Saccharomyces
Stanford.
MGI
Mus
Jackson Labs., Bar Harbor.
Gene Ontology -now
•
•
•
•
•
•
•
•
•
•
•
Fruitfly - FlyBase
Budding yeast - Saccharomyces Genome Database (SGD)
Mouse - Mouse Genome Database (MGD & GXD)
Rat - Rat Genome Database (RGD)
Weed - The Arabidopsis Information Resource (TAIR)
Worm - WormBase
Dictyostelium discoidem - Dictybase
InterPro/UniProt at EBI - InterPro
Fission yeast - Pombase
Human - UniProt, Ensembl, NCBI, Incyte, Celera, Compugen
Parasites - Plasmodium, Trypanosoma, Leishmania - GeneDB Sanger
• Microbes - Vibrio, Shewanella, B. anthracus, … - TIGR
• Grasses - rice & maize - Gramene database
• zebra fish – Zfin
.........
To provide
structured controlled vocabularies
for the
representation of biological knowledge
in
biological databases.
• Be open source
• Use open standards
• Make data & code available
without constraint
• Involve your community
Gene Ontology Objectives
• GO represents concepts used to classify
specific parts of our biological knowledge:
– Biological Process
– Molecular Function
– Cellular Component
• GO develops a common language applicable
to any organism
• GO terms can be used to annotate gene
products from any species, allowing
comparison of information across species
GO: Three ontologies
What does it do?
Molecular Function
What processes is it
involved in?
Biological Process
Where does it act?
Cellular Component
gene product
Content of GO
Molecular Function
Biological Process
Cellular Component
7,309 terms
10,041 terms
1,629 terms
Total
18, 975 terms
Definitions:
Obsolete terms:
94.9 %
992
What’s in a GO term?
term: gluconeogenesis
id: GO:0006094
definition: The formation of glucose from
noncarbohydrate precursors, such as
pyruvate, amino acids and glycerol.
Annotation of gene products
with GO terms
Mitochondrial P450
Cellular component:
mitochondrial inner membrane
GO:0005743
Biological process:
Electron transport
GO:0006118
substrate + O2 = CO2 +H20 product
Molecular function:
monooxygenase activity
GO:0004497
Other gene products annotated to
monooxygenase activity (GO:0004497)
- monooxygenase, DBH-like 1
(mouse)
- prostaglandin I2 (prostacyclin) synthase (mouse)
- flavin-containing monooxygenase (yeast)
- ferulate-5-hydrolase 1 (arabidopsis)
What’s in a name?
•
•
•
•
•
Glucose synthesis
Glucose biosynthesis
Glucose formation
Glucose anabolism
Gluconeogenesis
• All refer to the process of making glucose
from simpler components
tree
directed acyclic
graph
Parent-Child Relationships
Nucleus
Nucleoplasm
A child is a subset of
a parent’s elements
Nuclear
envelope
Nucleolus Chromosome Perinuclear space
The cell component term
Nucleus has 5 children
Ontology Relationships
Directed Acyclic Graph
Evidence Codes for GO
Annotations
http://www.geneontology.org/doc/GO.evidence.html
IEA
ISS
IEP
IMP
IGI
IPI
IDA
RCA
TAS
NAS
IC
ND
Inferred from Electronic Annotation
Inferred from Sequence Similarity
Inferred from Expression Pattern
Inferred from Mutant Phenotype
Inferred from Genetic Interaction
Inferred from Physical Interaction
Inferred from Direct Assay
Inferred from Reviewed Computational Analysis
Traceable Author Statement
Non-traceable Author Statement
Inferred by Curator
No biological Data available
Annotation summaries
Meloidogyne incognita: McCarter et al. 2003
Two types of GO Annotations:

Electronic Annotation

Manual Annotation
All annotations must:
• be attributed to a source
• indicate what evidence was found to
support the GO term-gene/protein
association
Manual Annotations
• High–quality, specific gene/gene product
associations made, using:
• Peer-reviewed papers
• Evidence codes to grade evidence
BUT – is very time consuming and requires
trained biologists
Manual Annotations:
Methods
1. Extract information from published literature
2. Curators performs manual sequence similarity
analyses to transfer annotations between
highly similar gene products (BLAST, protein
domain analysis)
Finding GO terms
In this study, we report the isolation and molecular characterization
of the B. napus PERK1 cDNA, that is predicted to encode a novel
receptor-like kinase. We have shown that like other plant RLKs,
the kinase domain of PERK1 has serine/threonine kinase activity,
In addition, the location of a PERK1-GFP
fusion kinase
proteinactivity,
to the
serine/threonine
plasma membrane supports the prediction that PERK1 is an
integral membrane protein…these kinases have been implicated in
early
stages
of wound
response…
integral
membrane
protein
wound response
PubMed ID: 12374299
Function:
protein serine/threonine kinase activity
GO:0004674
Component:
integral to plasma membrane
GO:0005887
Process:
response to wounding
GO:0009611
Electronic Annotations
• Provides large-coverage
• High-quality
BUT – annotations tend to use high-level
GO terms and provide little detail.
Electronic Annotations:
Methods
1. Database entries
• Manual mapping of GO terms to concepts
external to GO (‘translation tables’)
• Proteins then electronically annotated with
the relevant GO term(s)
2. Automatic sequence similarity analyses to
transfer annotations between highly
similar gene products
Electronic Annotations
Fatty acid biosynthesis
(Swiss-Prot Keyword)
EC:6.4.1.2
(EC number)
GO:Fatty acid biosynthesis
(GO:0006633)
GO:acetyl-CoA carboxylase
activity
(GO:0003989)
IPR000438: Acetyl-CoA
carboxylase carboxyl
transferase beta subunit
(InterPro entry)
GO:acetyl-CoA carboxylase
activity
(GO:0003989)
Mappings of external concepts to GO
EC:1.1.1.1 >
EC:1.1.1.10 >
EC:1.1.1.104 >
EC:1.1.1.105 >
GO:alcohol dehydrogenase activity ; GO:0004022
GO:L-xylulose reductase activity ; GO:0050038
GO:4-oxoproline reductase activity ; GO:0016617
GO:retinol dehydrogenase activity ; GO:0004745
Additional points
• A gene product can have several functions,
cellular locations and be involved in many
processes
• Annotation of a gene product to one ontology
is independent from its annotation to other
ontologies
• Annotations are only to terms reflecting a
normal activity or location
• Usage of ‘unknown’ GO terms
Unknown v.s. Unannotated
• “Unknown” is used when the curator has
determined that there is no existing literature
to support an annotation.
– Biological process unknown GO:0000004
– Molecular function unknown GO:0005554
– Cellular component unknown GO:0008372
• NOT the same as having no annotation at all
– No annotation means that no one has looked yet
Annotation of a genome
• GO annotations are always work in progress
• Part of normal curation process
– More specific information
– Better evidence code
• Replace obsolete terms
• “Last reviewed” date
How to access the Gene ontology
and its annotations
1. Downloads
• Ontologies
• Annotations : Gene association files
• Ontologies and Annotations
2. Web-based access
• AmiGO
(http://www.godatabase.org)
• QuickGO
(http://www.ebi.ac.uk/ego)
among others…
组别
A
C
D
E
H
M
S
第四讲:讨论论文(课堂讨论
时间5分左右)