No Slide Title

Download Report

Transcript No Slide Title

Annotating with GO: an overview
http://www.geneontology.org/
What is a Gene Ontology (GO) annotation?
Databases external to GO make cross-links between GO terms and objects in their databases (typically, gene
products, or their surrogates, genes), and then provide tables of these links to GO. The GO itself contains no
information about genes or gene products. The GO annotation (‘gene association’) files are all publicly available:
Database name abbreviation
A gene product is annotated to one or
more terms in each of the three
ontologies; biological process, cellular
component and molecular function.
http://www.geneontology.org/#annotations
Abbreviations used by GO are described here:
http://www.geneontology.org/doc/GO.xrf_abbs
Gene products are annotated to the most specific GO term
possible for the information available.
Example annotation:
Database Object identifier. A Database
Object is usually a gene product, but can
also be a gene or a transcript.
Used when it is specified in the source
that that a gene product is NOT
associated with a particular gene
product e.g. “we have found that
protein Z is not involved in the X
cascade”.
DB
DB_Object_ID
DB_Object_
Symbol
SGD
S0000296
PHO3
SGD
S0000296
PHO3
[NOT]
A gene product is annotated
GO:0015888
ÊÊÊÊ
DB:Reference
(|DB:Reference)
go_id
GO:0003993
SGD:8789 |PMID:267
6709
SGD:8789 |PMID:267
6709
Evidence
With
IMP
IMP
Aspect
DB_Object_Name
(|Name)
DB_Object_Synonym
(|Synonym)
DB_Object_
Type
Taxo n
(|taxon)
Date
with terms reflecting only its normal
YBR092C
gene
taxon:4932
20001122
activities, locations and processes.
YBR092C
gene
taxon:4932
20001122
P
ÊÊÊÊ
F
ÊÊÊÊ
Fields highlighted in grey are mandatory
Gene Ontology term identifier
When there is no information regarding one or more
aspects of a gene product, the gene product is annotated to
the GO term ‘unknown’.
Object type: gene, transcript or
protein
Annotation of a gene product to one ontology is
independent of its annotation to the other two ontologies.
The
annotation
of
P = biological process, F = molecular
Taxonomic identifier for gene
gene products to GO
function and C = cellular component.
product
terms is performed according to
two main principles: the recording of the
source of the annotation and the type
of evidence on which
the annotation was
based.
The source of an annotation may be a literature reference, a database
The evidence describes how the annotation was created, and provides a
record or the type of computational anaylsis. Literature references are
way of measuring its strength or reliability. GO has developed a set of
entered as an accession number, either from the database in question
standard evidence codes which form a loose hierarchy, with ‘inferred by
and/or from PubMed. Annotations based on computational analysis
electronic annotation’ (IEA) being the least reliable type of evidence,
include a reference to the method of analysis.
followed by ‘inferred by sequence similarity’ (ISS).
Evidence codes
IDA
inferred from direct assay
IEP
inferred from expression pattern
IMP inferred from mutant phenotype
IEA
inferred from electronic annotation
IGI
inferred from genetic interaction
TAS
traceable author statement
IPI
inferred from physical interaction
NAS non-traceable author statement
ISS
inferred from sequence similarity
ND
IC
inferred by curator
no biological data available
Collaborating databases
Many important databases produce GO annotations and contribute to the development of the GO. These include:
FlyBase (database for the fruitfly Drosophila melanogaster), Berkeley Drosophila Genome Project (Drosophila informatics; GO database & software), Saccharomyces Genome Database (SGD) (database for the budding yeast Saccharomyces cerevisiae), Mouse Genome
Database (MGD) & Gene Expression Database (GXD) (databases for the mouse Mus musculus), The Arabidopsis Information Resource (TAIR) (database for the brassica family plant Arabidopsis thaliana), WormBase (database for the nematode Caenorhabditis elegans),
PomBase (database for the fission yeast Schizosaccharomyces pombe), Rat Genome Database (RGD) (database for the rat Rattus norvegicus), DictyBase (informatics resource for the slime mold Dictyostelium discoideum), The Pathogen Sequencing Unit (The Wellcome Trust
Sanger Institute), Genome Knowledge Base (GKB) (Cold Spring Harbor Laboratory), EBI : InterPro - SWISS-PROT - TrEMBL groups, The Institute for Genomic Research (TIGR), Gramene (A Comparative Mapping Resource for Monocots), Compugen (with its Internet
Research Engine).