GO annotation - Gene Ontology

Download Report

Transcript GO annotation - Gene Ontology

GO Further
24th Feb 2006
Jane Lomax
GO annotations
• Where do the links between genes and
GO terms come from?
24th Feb 2006
Jane Lomax
GO annotations
• Contributing databases:
–
–
–
–
–
–
–
–
–
–
–
–
–
–
Berkeley Drosophila Genome Project (BDGP)
dictyBase (Dictyostelium discoideum)
FlyBase (Drosophila melanogaster)
GeneDB (Schizosaccharomyces pombe, Plasmodium falciparum, Leishmania
major and Trypanosoma brucei)
UniProt Knowledgebase (Swiss-Prot/TrEMBL/PIR-PSD) and InterPro databases
Gramene (grains, including rice, Oryza)
Mouse Genome Database (MGD) and Gene Expression Database (GXD) (Mus
musculus)
Rat Genome Database (RGD) (Rattus norvegicus)
Reactome
Saccharomyces Genome Database (SGD) (Saccharomyces cerevisiae)
The Arabidopsis Information Resource (TAIR) (Arabidopsis thaliana)
The Institute for Genomic Research (TIGR): databases on several bacterial
species
WormBase (Caenorhabditis elegans)
Zebrafish Information Network (ZFIN): (Danio rerio)
24th Feb 2006
Jane Lomax
Species coverage
• All major eukaryotic model organism
species
• Human via GOA group at UniProt
• Several bacterial and parasite species
through TIGR and GeneDB at Sanger
– many more in pipeline
24th Feb 2006
Jane Lomax
Annotation coverage
24th Feb 2006
Jane Lomax
Anatomy of a GO annotation
• Three key parts:
– gene name/id
– GO term(s)
– evidence for association
24th Feb 2006
Jane Lomax
Example annotation
• Breast cancer type 1 susceptibility protein gene
in humans
24th Feb 2006
Jane Lomax
Types of GO annotation:
24th Feb 2006

Electronic Annotation

Manual Annotation
Jane Lomax
Manual annotation
• Created by scientific curators
• High quality
• Small number
24th Feb 2006
Jane Lomax
Manual annotation
In this study, we report the isolation and molecular
characterization of the B. napus PERK1 cDNA, that is predicted to
encode a novel receptor-like kinase. We have shown that like
other plant RLKs, the kinase domain of PERK1 has
serine/threonine kinase activity, In addition, the location of a
PERK1-GTP fusion protein to the plasma membrane supports the
prediction that PERK1 is an integral membrane protein…these
kinases have been implicated in early stages of wound response…
24th Feb 2006
Jane Lomax
Manual annotation
24th Feb 2006
Jane Lomax
Electronic Annotation
• Annotation derived without human
validation
– mappings file e.g. interpro2go, ec2go.
– Blast search ‘hits’
• Lower ‘quality’ than experimental codes
24th Feb 2006
Jane Lomax
Mappings files
Fatty acid biosynthesis
( Swiss-Prot Keyword)
EC:6.4.1.2
(EC number)
GO:Fatty acid biosynthesis
(GO:0006633)
GO:acetyl-CoA carboxylase
activity
(GO:0003989)
IPR000438: Acetyl-CoA
carboxylase carboxyl
transferase beta subunit
(InterPro entry)
24th Feb 2006
Jane Lomax
GO:acetyl-CoA carboxylase
activity
(GO:0003989)
Evidence types
•
•
•
•
•
•
•
•
•
•
ISS: Inferred from Sequence/structural Similarity
IDA: Inferred from Direct Assay
IPI: Inferred from Physical Interaction
IMP: Inferred from Mutant Phenotype
IGI: Inferred from Genetic Interaction
IEP: Inferred from Expression Pattern
TAS: Traceable Author Statement
NAS: Non-traceable Author Statement
IC: Inferred by Curator
ND: No Data available
•
IEA: Inferred from electronic annotation
24th Feb 2006
Jane Lomax
GO terms
• Where do GO terms come from?
– most GO terms are added by the GO editorial office
at EBI
– new terms are usually only added when they are
asked for by annotators
– GO editors work with experts to make major
ontology developments
• metabolism
• pathogenesis
• cell cycle
24th Feb 2006
Jane Lomax
GO stats
• almost 20,000 GO terms
– 10452 biological_process
– 1687 cellular_component
– 7393 molecular_function
24th Feb 2006
Jane Lomax
Ja
n0
M 1
ar
M 01
ay
-0
Ju 1
lSe 01
pN 01
ov
-0
Ja 1
nM 02
ar
M 02
ay
-0
Ju 2
lSe 02
pN 02
ov
-0
Ja 2
nM 03
ar
M 03
ay
-0
Ju 3
lSe 03
pN 03
ov
-0
Ja 3
nM 04
ar
M 04
ay
-0
Ju 4
lSe 04
pN 04
ov
-0
Ja 4
n0
M 5
ar
-0
5
Number of terms
Growth of GO
GO term history 2001 - 2005
25000
20000
15000
defined terms
undefined terms
obs oletes
10000
5000
0
Dat e
24th Feb 2006
Jane Lomax
No GO Areas
• GO covers ‘normal’ functions and
processes
– No pathological processes
– No experimental conditions
• NO evolutionary relationships
• NO gene products
• NOT a system of nomenclature
24th Feb 2006
Jane Lomax
Open Biomedical Ontologies (OBO)
• A repository for well-structured
controlled vocabularies for shared use
across different biological and medical
domains:
http://obo.sourceforge.net/
24th Feb 2006
Jane Lomax
Open Biomedical Ontologies (OBO)
• Requirements for inclusion:
http://obo.sourceforge.net/crit.html
24th Feb 2006
Jane Lomax
AmiGO exercise
24th Feb 2006
Jane Lomax
Annotation exercise
• We have provided a Nature paper (PMID:
14961121) for you to annotate with GO terms
– This will help you to understand how the information
is extracted from papers and GO terms are applied
by the curators
– It will also give you the opportunity to use another
GO browser developed at EBI: QuickGO
24th Feb 2006
Jane Lomax
Annotation exercise
• The gene you are annotating is VG5Q
– To make it easier we’ve highlighted some of
the most relevant passages in the text
• Use the GO browser QuickGO to look for
the most appropriate GO terms:
– http://www.ebi.ac.uk/ego/
24th Feb 2006
Jane Lomax
Annotation exercise
• In QuickGO, you search for the GO terms
by name
http://www.ebi.ac.uk/ego/
24th Feb 2006
Jane Lomax
Annotation exercise
• Remember, as well as the GO term, you
also need to assign an evidence code
– to remind you, we’ve included a list of the
evidence codes at the back of the paper
24th Feb 2006
Jane Lomax
Annotation exercise
• To see how your annotations compared to
those done by the GO curator, search
QuickGO for Q8N302
– This is the UniProt id for the gene VG5Q
• Click ‘show only manual’ and this will
show you the annotations the curator
made
24th Feb 2006
Jane Lomax