presentation

Download Report

Transcript presentation

GO Annotation for PRO
Harold J Drabkin
Senior Scientific Curator
The Jackson Laboratory
Mouse Genome Informatics
Bar Harbor, ME
http://www.geneontology.org/GO.annotation.html
What IS the GO?
•
The Gene Ontology is a dictionary of concepts
used to describe (annotate) the normal
properties of a gene product
More than just a term list
– The dictionary is structured
• Multiple parentage
• Terms are related to each other
using the following
relationships:
– Is_a
– Part_of
– Regulates
– Negatively regulates
– Positively regulates
• The relationships between two
terms are directed
– Using a specific term automatically
uses all of it’s parent terms
The GO Describes Three Domains
Mitochondrial P450,
(CC24 PR01238; MITP450CC24)
What is an annotation?
Smith et al. determined by transfection assays that Abc2 is a protein
kinase involved in the signal transduction, and is located in the
cytoplasm.
An annotation is a statement that a gene product …
…has a particular molecular function or
…is involved in a particular biological process or
…is located within a certain cellular component
…as determined by a particular method
…as described in a particular reference.
Reference
GO Terms
Evidence
Code
Anatomy of an annotation
• Object (gene product (or transcript coding for it, or gene that codes for it ).
• GO Term
– GO Term Qualifier (optional)
• NOT, Co_localizes with, or Contributes_to
• Evidence Code :
– IDA, IPI, IMP, IEP, IGI, EXP, ISS, ISA, ISO, IEA, TAS, NAS, or IC
• Evidence Code Qualifier (required for some codes); called
“inferred from” or “with”
– Used in combination with IPI, IMP, IGI, ISS, and IEA
» Seq_ID or DB_ID required for ISO, ISA, and IPI
• Reference: literature or database specific reference
– DB_ID or PMID
Various GO Annotation Strategies
• Electronic (IEA)
– Domain mappings, etc.
• Manual
– Reviewed electronic
– Literature based
• Direct experiments
• Sequence inferences
All PRO annotation will be manual
using the experimental literature.
GO Evidence Codes
Code
Definition
IEA
Inferred from electronic annotation
RCA
Reviewed computational analysis
IDA
Inferred from direct assay
The
interface
will takeinteraction
care of the
IPI RACE-PRO
Inferred
from physical
evidence codes.
IMP
Inferred from mutant phenotype
IGI
Inferred from genetic interaction
EXP
Experimental
IC
Inferred by curator
ISS
Inferred from sequence similarity
ISO
Inferred from orthology
ISM
Inferred from sequence model
Use
PRO will mostly require only one
code: EXP (experimental; The article
Experiments
contains an experiment to back the
annotation)
Manually
annotated
IPI
will always be used with
annotation to GO:0005515 protein
binding, and the binding partner will
be documented
Sequence
Comparisons
Manual Annotation
An Overview
1. Find a paper about the protein.
P05147
– Use a paper that gives direct
experimental evidence for the
normal function, process, or
cellular location of the gene
product.
PMID: 2976880
• 2. READ the full papers!
– Abstracts alone can be very misleading
• the species may not be specified. Sometimes a
paper uses human, mouse and rat interchangeably,
or uses human for one gene and mouse for a
different one.
• Specific isoforms used may not be noted
• Not all experiments may be mentioned
P05147
PMID: 2976880
Find the GO
term describing its
function, process
or location of action.
GO:0047519
http://www.godatabase.org
Getting the
GO
http://www.informatics.jax.org/searches/GO_form.html
http://www.ebi.ac.uk/ego
Selecting GO Terms
• Read the definition
• Look at the terms placement in the ontology
(its parents and its children) as an aid.
• Remember: the parents of the term you
choose will also apply to the protein.
– If they don’t, it’s either the wrong term, or the
ontology is incorrect (email the GO!).
Annotate to finest granularity
Annotating to GO:0030047 automatically annotates to all of its
parents; thus a product is annotated to both protein modification
AND cytoskeleton organization
GO Does not annotate substrates
• A gene product that has protein kinase
activity is also involved in the process of
protein phosphorylation
• The protein that gets phosphorylated is
NOT annotated to the process of protein
phosphorylation.
Example Annotations
•Abstract suggests that this paper demonstrates that Ibtk
– Binds to a protein kinase
– Inhibits kinase activity
– Inhibits calcium mobolization
– Inhibits transcription
Evidence used function
Use most specific term
possible
Evidence used for process
Both IDA
Both Btk and iBtk have protein binding
activity to each other, IPI evidence code
Abstract totally misses the
sub-cellular localization!!!
Some Special Cases
GO Term Qualifiers
– “NOT”
• Can be used with any term
– “contributes_to”
• Used for only molecular function
– “co_localizes with”
• Used only with cellular component
The “not” GO Term Qualifier
'NOT' is used to make an explicit note that the gene product is
not associated with the GO term. This is particularly important
in cases where associating a GO term with a gene product
should be avoided (but might otherwise be made, especially by
an automated method).
e.g. This protein does not have ‘kinase activity’ because the
Author states that this protein has a disrupted/missing an ‘ATP
binding’ domain.
Also used to document conflicting claims in the literature.
NOT can be used with ALL three GO Ontologies.
The ‘contributes_to’ qualifier
Contributes_to: An individual gene product that is part of a complex can be
annotated to terms that describe the action (function or process) of the
complex.
This practice is colloquially known as annotating 'to the potential of the
complex‘.
This qualifer allows us to distinguish the individual subunit from complex
functions e.g. contributes_to ribosome binding when part of a complex but does
not perform this function on its own.
All gene products annotated using 'contributes_to' must also be
annotated to a cellular component term representing the complex that
possesses the activity.
Only used with GO Function Ontology
The Qualifier documentation:
http://www.geneontology.org/GO.annotation.html
co_localizes_with
• Used where gene products transiently or
peripherally associate with an organelle or
complex in conjunction with a cellular
component term,
• Also used in cases where the resolution of an
assay is not accurate enough to say that the
gene product is a bona fide component
member.
http://www.geneontology.org/GO.annotation.html
Can’t find a term?