Transcript Document

Introduction to GO Annotation
Eurie Hong (SGD), Michelle Gwinn (TIGR),
Tanya Berardini (TAIR), Karen Pilcher (DictyBase),
Russell Collins (FlyBase), Carol Bastiani (Wormbase),
Doug Howe (ZFIN), Stacia Engel (SGD)
What is a GO annotation?
Qualifiers
References
NOT
contributes_to
colocalizes_with
Gene
(protein coding gene,
functional RNA)
With/From
IMP, IGI, IPI,
ISS, IDA, IEP,
TAS, NAS, ND,
RCA, IC
GO Term
Evidence code
Supporting
evidence for
certain evidence
codes
• What is an annotation?
• Strategies for identifying literature to annotate
• Identifying the correct annotation
• Molecular Function
• Biological Process
• Cellular Component
• Extent of annotation for a single gene product
• Strategies for annotating a genome
Which type of literature is appropriate
for annotation?
•
Papers with experimental evidence for GO
process, function or component annotation
•
•
•
•
•
•
•
Mutant phenotype descriptions
Enzymatic activity assays
Localization studies
Papers describing phylogenetic studies for GO
function annotation (ISS)
Reviews
(Textbooks)
(Meeting abstracts)
Strategies for reading a paper for
annotation
•
•
•
•
Abstract
Results/Figures
Materials and Methods
Discussion
Which granularity of GO term
is appropriate for annotation?
Molecular Function
Souza et al. (1998)
YakA, a protein kinase required for the transition from
growth to development in Dictyostelium.
PMID: 9584128
Background
• YakA was identified as a developmental
mutant
• YakA is an ortholog of the yeast Yak1p
• The protein kinase domain of YakA is
similar to both serine/threonine kinases
and tyrosine kinases
PMID: 9584128
YakA belongs to the DYRK family
YakA is a member of the DYRK family of protein
kinases (dual-specificity tyrosine-regulated kinase)
The Experiment
• Assay for YakA protein kinase activity
• YakA + γ32P-ATP + MBP (substrate)
• Look for presence of 32P in substrate in
the presence of YakA
PMID: 9584128
The Result
PMID: 9584128
GO Term for Annotation
protein kinase activity ; GO:0004672
Definition: Catalysis of the transfer of a phosphate
group, usually from ATP, to a protein substrate.
• MBP (myelin basic protein) is a generic substrate
• Kinase specificity not determined; no phosphotyrosine antibodies used, for example
Searching for Terms in DAGEdit
Search term name that contains:
• kinase 359 results
• protein kinase 60 results
• protein kinase activity 20 results
Search Output in DAG-Edit
Sibling Terms in DAG-Edit
Child Terms in DAG-Edit
Parent Terms in AmiGO
Evidence Code
• The evidence code for the protein
kinase activity term is IDA (Inferred
from Direct Assay)
• Although endogenous substrates were
not tested, the authors clearly showed
kinase activity with a direct assay
Granular Terms Using ISS
(Inferred from Sequence or structural Similarity)
protein serine/threonine kinase activity ;
GO:0004674
protein tyrosine kinase activity ;
GO:0004713
How is Biological Process different form Molecular Function?
Molecular Function…
Biological Process…
“Elemental activities, such as catalysis or binding,
describing the actions of a gene product at the molecular
level. A given gene product may exhibit one or more
molecular functions.”
“A phenomenon marked by changes that lead to a
particular result, mediated by one or more gene
products.”
is about the protein.
is about the organism.
are the activities that a protein specifically and
directly does.
are the organism uses those activities for.
for example
A hammer hammers nails…
and builds houses.
Rho1 has GTPase activity…
and the organism uses that activity for
gastrulation, axon guidance, germ cell
migration, etc …
Important points:
Process is a migration of germ
(pole) cells.
It is the movement of cells from
one side of the epithelium to the
other.
It is one step in a three step process.
Is a new term needed?
New term might be appropriate because it would describe a
discrete, separable process, thus providing additional useful
information to the user.
Also, a new term(s) permit linking two similar processes that are
currently separate in GO, but are connected in the literature.
cell migration
(is a) transepithelial cell migration
(is a) pole cell transepithelial migration
(is a) cellular extravasation
cell migration
(is a) germ cell migration
(is a) pole cell migration
(part of) pole cell transepithelial migration
Annotating to the
Cellular Component
Ontology
Carol Bastiani, Caltech
Experiment: Immunolocalization of
LIN-10 with a LIN-10 antibody.
Localization of LIN-10 by Immunoflourescence:
Vulval epithelial cells can be distinguished from ventral cord neurons by their larger size and the
presence of stained cell junctions (red)
Figure 7. LIN-10 is expressed in neurons. (A-C) Wild-type, late L3 hermaphrodite stained with
anti-LIN-10 antibodies (green). LIN-10 is present in ventral cord processes (A, *), lateral neural cell
bodies and processes (A and B, arrowheads), and dorsal cord processes
Search MGI GO Browser for neuron:
Choosing the evidence code:
Further subcellular localization of LIN-10:
In neural cell bodies, a small amount of LIN-10 appears diffusely throughout the cytoplasm, whereas
the majority of LIN-10 is concentrated in discrete perinuclear structures (Figure 7, D and E), similar to
perinuclear structures observed in vulval epithelial cells. To determine whether these perinuclear
structures correspond to Golgi, we used ST-GFP as a marker for the trans-cisterna of the Golgi (Jamora
et al., 1997). We expressed ST-GFP in transgenic worms using a heat shock promoter and examined the
subcellular localization of LIN-10 and ST-GFP using anti-LIN-10 and anti-GFP antibodies. In single
neurons expressing both endogenous LIN-10 and transgenic ST-GFP, the subcellular pattern of LIN-10
staining is similar to that of ST-GFP staining. Deconvolution of images obtained in double-staining
experiments revealed that LIN-10 staining is closely associated with ST-GFP staining (Figure 7, F-I),
but LIN-10 staining is consistently offset (by 0.2-0.5 µm) from ST-GFP staining. These results indicate
that LIN-10 is localized in the trans-cisterna of the Golgi or is localized in a compartment closely
associated with the trans-cisterna, such as the trans-Golgi network.
LIN-10 is localized to:
1) Cytoplasm
2) Within or in association with a part of
the Golgi apparatus/ in close association
with the trans-cisterna or trans-Golgi
network
1) Annotate to cytoplasm:
LIN-10 is localized to:
1) Cytoplasm
2) Within or in association with a part of
the Golgi apparatus/ in close association
with the trans-cisterna or trans-Golgi
network
2)Annotate to Golgi apparatus, evidence code IDA:
Qualifier to use “when the resolution of the assay is
not accurate enough to say that the gene product is
a bona fida component member:”
Strategies for annotation of a genome
1. How to get a complete set of GO annotations
2. Updating GO annotations
3. Representative approaches
Strategies for annotation of a genome
How to get a complete set of GO annotations
• Complete a first pass
– For all 3 aspects (MF, BP, CC)
– For all genes that get GO annotations
• Proteins, RNAs, pseudogenes
• NOT centromeres, telomeres, LTRs,
retrotransposons, ARSs
– Unknowns are allowed
Strategies for annotation of a genome
Updating the complete set of GO annotations
• Second pass
– Replace unknowns
– Update where IEA was used
• Info with “better” evidence code, if available
– Update where other db’s are referenced
• Primary literature is preferred
Strategies for annotation of a genome
Updating GO annotations - ongoing
• GO annotations will never be “done”
• Part of normal curation process
– More specific information
– Better evidence code
• Replace obsolete terms
• “Last reviewed” date
Strategies for annotation of a genome
Updating GO annotations - ongoing
Strategies for annotation of a genome
Representative approaches