CACAO_remote_training

Download Report

Transcript CACAO_remote_training

CACAO - Remote training
http://gowiki.tamu.edu/wiki/index.php/Category:CACAO
Gene Function and Gene Ontology
Fall 2011
“Scientists find gene that ...”
An avalanche of genes
•
•
High throughput
sequencing is finding
genes faster than we
can understand them
Goals for annotation:
–
–
Where the genes are in
the genome
What their functions are
Function annotation
•
Allows us to
–
Infer the functions of genes
•
•
•
•
Related by common descent
Related by similar expression patterns
Related by phylogenetic profiles
...
Function annotation
•
Allows us to
–
–
–
Understand the capabilities of
organisms genomes
Understand patterns of gene
expression
•
•
•
...
In different environments
In different tissues
In disease states
Classic MODel
Literature
Database
Curators
(rate limiting)
Datasets
Requirements
•
•
•
Accurate functional annotation for as many
genes as possible
A system of assigning function that allows
both humans and computers to compare,
contrast, analyze, and predict gene
function
Curators to make and/or check these
assignments
–
For CACAO, we will teach you what
biocurators do.
CACAO
• Community
• Assessment
–
• Community
–
• Annotation with
–
• Ontologies
How well can
you (with our coaching)
assign gene functions
–
using GO?
CACAO is competitive
• Teams get points for complete annotations
–
–
–
–
GO term (right level of specificity)
reference
evidence code
identify where in the paper the evidence comes
from
• Teams can take away points from competitors
by challenging annotations
– finding a problem
– suggesting a better alternative
What’s in it for you
(besides credit)?
–
We hope you will
•
•
•
•
learn how we think
about gene function
gain skills that will
help your future
career
enjoy contributing to
a resource used by
people all over the
world
have fun!
The gist of CACAO…
Finding evidence
(in papers)
Making annotations
Using GO terms
GO = Gene Ontology
•
•
Controlled vocabulary
–
–
Everyone uses the same terms
Terms have IDs that computers can
understand
Relationships between functions
Gene Ontology
A common system for describing gene function
GO
• 3 aspects (ontologies) for gene products
1. Biological Process
2. Molecular Function
3. Cellular Component
• Used to make annotations
– aka Gene associations
– Term + qualifiers + evidence code + reference etc.
Molecular Function
• activities or “jobs” of a gene product
glucose-6-phosphate isomerase activity
figure from GO consortium presentations
from GOC
Biological Process
a commonly recognized series of events
cell division
Figure from Nature Reviews Microbiology 6, 28-40 (January 2008)
Cellular Component
• where a gene product acts
Key elements of a GO annotation
Submitted to
GO consortium
Viewable on
GONUTS
**Don’t worry - I will cover this again (several times)!
GO Annotation
•
To make an annotation, you need to
–
Assign GO terms to genes (gene
products)
•
•
–
At appropriate level of specificity
Sometimes with Qualifiers
–
–
–
NOT
Contributes_to
Colocalizes_with
Record the evidence
Record the evidence
•
•
Where it came from:
–
Reference (database accession)
•
PMID:6987663
Kind of evidence:
–
Evidence codes
•
•
•
IMP: Inferred from Mutant Phenotype
IDA: Inferred from Direct Assay
…
CACAO - the “Community
Annotation” part
What I am going to tell you about next is:
1. How to choose proteins to annotate
2. Finding GO terms & navigating a GO term
page
3. Finding UniProt accessions
4. Making gene pages on GONUTS & the
anatomy of a gene page
5. How and where to add an annotation
6. Where to look for your annotations & other
teams’ annotations … (& the challenges!)
http://gowiki.tamu.edu/wiki/index.php/
Deciding what to annotate
1. randomly
2. topics of interest (ie efflux pump proteins, biofilms)
3. papers you have come across while doing other stuff
4. methods you know or want to learn
5. phenotypes and mutants you are interested in
6. by author
7. by pathway or regulon
8. suggested by another (ie high IEA:manual annotation ratio)
9. current paper mentions another gene product
10. review papers (ie Annual Reviews are excellent sources)
EXAMPLE #1: let’s say you have a great paper (PMID:1111)
that characterizes the tyrosine kinase activity of your
favorite protein (human p53)…
Part I: Where do you search
for GO terms? GONUTS
http://gowiki.tamu.edu
• CHICK - AgBase (Gallus gallus)
• dictyBase - dictyBase (Dictyostelium discoideum - slime mold)
• FB - FlyBase (Drosophila melanogaster)
• HUMAN - Reactome, BHF-UCL
• MGI - Mouse genome informatics (Mus musculus - house mouse)
• SGD - Saccharomyces genome database (Saccharomyces cerevisiase - yeast)
• TAIR - The Arabidopsis Informatics Resource (Arabidopsis thaliana)
• WB - WormBase (Caenorhabditis elegans)
• ZFIN - Zebrafish model organism database (Danio rerio)
What do you actually need once
you have found the correct term?
GO:0004713
Part II: You now have a paper, a
protein & you found a suitable GO
term… what next?
• UniProt accession - http://www.uniprot.org
- Search (“Query”) & find the correct UniProt accession
for your protein
- Look something like: P012A9
Part III: Where are you going to
add your annotations? GONUTS
http://gowiki.tamu.edu
How do you make a new gene
page in GONUTS?
•
•
Use the UniProt accession to make a page that you will be
able to add your own annotation to.
GoPageMaker will:
1.
2.
Check if the page exists in GONUTS & take you there if it does.
Make a page & pull all of the annotations from UniProt into a
table that you can edit.
Where do you add an annotation?
Add a row in the table.
What you must fill in (for every
annotation)
GO:0004713
PMID:1111
IDA: Inferred from
direct assay
Figure 2a
What you might also have to fill in
Not sure? Check the competition guidelines. Ask a coach (Jim,
Debby, Adrienne or usually me)!
Where will your annotation
now show up?
1. In the “Annotation” table on the gene page you
just edited
2. In the table on your user page
http://gowiki.tamu.edu/wiki/index.php/User:Siebenmc
3. In the table on your team page
http://gowiki.tamu.edu/wiki/index.php/Category:Team_Mu_subunits
4. As points on the scoreboard
http://gowiki.tamu.edu/wiki/index.php/Category:CACAO_Spring_2011
5. If challenged, it will show up in the “Submitted
Challenges” table (below the scoreboard)
Questions?
At this point, you should be able to:
1. Find GO terms on GONUTS
2. Find UniProt accessions on UniProt
3. Make a gene page on GONUTS
4. Add an annotation
CACAO - the “Community
Assessment” part
1
3
2
Scoreboard
Submitted
Challenges
Moving
through
challenges
Closed
Challenges
http://gowiki.tamu.edu/wiki/index.php/Category:Michigan_State_CACAO
Category:Team
UCL1
Example starting from a paper
1.
Hypothetically given a paper in another class on
human gastric lipase by Wicker-Planquart et al
(1999).
2.
http://www.ncbi.nlm.nih.gov/pubmed/10411623
What is the molecular function of the protein?
What process is it involved in?
Where is it doing it’s job(s) in the cell?
Examples starting from a topic
1. Search PubMed for “biofilm genes”
2. Eighth paper is - Isolation of Genes Involved In
Biofilm Formation of a Klebsiella pneumoniae…
3. http://www.ncbi.nlm.nih.gov/pubmed/21858144
What proteins are discussed
in this paper?
• What is the molecular function of each
protein?
• What processes are they involved in?
• Where are they doing their jobs?
• WHAT DO THE AUTHORS DEMONSTRATE
IN THE PAPER?
Example starting from a protein
1.
Searched Uniprot for “biofilm”
2.
Protein from E. coli - BssR
3.
Search on PubMed for “bssR AND coli”
4.
http://www.ncbi.nlm.nih.gov/pubmed/16597943
What is the molecular function of these proteins?
What process are they involved in?
Where are they doing their job(s) in the cell?