From Functional Genomics to Physiological Model: the Gene

Download Report

Transcript From Functional Genomics to Physiological Model: the Gene

Getting Started: a user’s
guide to the GO
GO Workshop
3-6 August 2010
1.
2.
3.
4.
Provides structural annotation for
agriculturally important genomes
Provides functional annotation (GO)
Provides tools for functional modeling
Provides bioinformatics & modeling
support for research community
Avian Gene Nomenclature
Introduction to GO





Anatomy of a GO term: a GO annotation
example
GO evidence codes
Making annotations: literature
biocuration & computation analysis
ND vs no GO
Using the GO


GO tools
Functional modeling considerations
Gene Ontology (GO)

Not about genes!



Gene products: genes, transcripts, ncRNA,
proteins
The GO describes gene product function
Not a single ontology



Biological Process (BP or P)
Molecular Function (MF or F)
Cellular Component (CC or C)
What is the Gene Ontology?
“a controlled vocabulary that can be applied to all organisms even as
knowledge of gene and protein roles in cells is accumulating and
changing”
assign functions to gene products at different levels,
depending on how much is known about a gene
product
 is used for a diverse range of species
 structured to be queried at different levels, eg:
 find all the chicken gene products in the genome
that are involved in signal transduction
 zoom in on all the receptor tyrosine kinases
 human readable GO function has a digital tag to
allow computational analysis of large datasets

COMPUTATIONALLY AMENABLE ENCYCLOPEDIA OF
GENE FUNCTIONS AND THEIR RELATIONSHIPS
relationships
between terms
Ontologies
digital identifier
(computers)
As of ontology version 1.1348 (27/07/2010):
description
(humans)
32,091 terms, 99.3% defined
* 19169 biological process
* 2745 cellular component
* 8736 molecular function
1441 obsolete terms (not included in figures above)
GO annotation example
NDUFAB1 (UniProt P52505)
Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa
Biological Process (BP or P)
GO:0006633 fatty acid biosynthetic process TAS
GO:0006120 mitochondrial electron transport, NADH to ubiquinone TAS
GO:0008610 lipid biosynthetic process IEA
NDUFAB1
GO:0005504
GO:0008137
GO:0016491
GO:0000036
Molecular Function (MF or F)
fatty acid binding IDA
NADH dehydrogenase (ubiquinone) activity TAS
oxidoreductase activity TAS
acyl carrier activity IEA
Cellular Component (CC or C)
GO:0005759 mitochondrial matrix IDA
GO:0005747 mitochondrial respiratory chain complex I IDA
GO:0005739 mitochondrion IEA
GO annotation example
NDUFAB1 (UniProt P52505)
Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa
GO:ID (unique)
aspect or ontology
GO evidence code
GO term name
GO EVIDENCE CODES
Direct Evidence Codes
IDA - inferred from direct assay
IEP - inferred from expression pattern
IGI - inferred from genetic interaction
IMP - inferred from mutant phenotype
IPI - inferred from physical interaction
Guide to GO Evidence
Codes
http://www.geneontol
ogy.org/GO.evidence.s
html
Indirect Evidence Codes
inferred from literature
IGC - inferred from genomic context
TAS - traceable author statement
NAS - non-traceable author statement
IC - inferred by curator
inferred by sequence analysis
RCA - inferred from reviewed computational analysis
IS* - inferred from sequence*
IEA - inferred from electronic annotation
Other
NR - not recorded (historical)
ND - no biological data available
ISS - inferred from sequence or structural similarity
ISA - inferred from sequence alignment
ISO - inferred from sequence orthology
ISM - inferred from sequence model
GO EVIDENCE CODES
Direct Evidence Codes
GO
Mapping
IDA
- inferred
fromExample
direct assay
IEP - inferred from expression pattern
IGI - inferred from genetic interaction
IMP - inferred from mutant phenotype
IPI - inferred from physical interaction
Indirect Evidence Codes
inferred from literature
IGC - inferred from genomic context
TAS - traceable author statement
NAS - non-traceable author statement
IC - inferred by curator
inferred by sequence analysis
RCANDUFAB1
- inferred from reviewed computational analysis
IS* - inferred from sequence*
IEA - inferred from electronic annotation
Other
NR - not recorded (historical)
ND - no biological data available
Biocuration of literature
• detailed function
• “depth”
• slower (manual)
P05147
Biocuration of Literature:
detailed gene function
Find a paper
about the protein.
PMID: 2976880
Read paper to get experimental evidence of
function
Use most specific term
possible
experiment assayed kinase activity:
use IDA evidence code
GO EVIDENCE CODES
Direct Evidence Codes
GO
Mapping
IDA
- inferred
fromExample
direct assay
IEP - inferred from expression pattern
IGI - inferred from genetic interaction
IMP - inferred from mutant phenotype
IPI - inferred from physical interaction
Biocuration of literature
• detailed function
• “depth”
• slower (manual)
Indirect Evidence Codes
inferred from literature
IGC - inferred from genomic context
TAS - traceable author statement
NAS - non-traceable author statement
IC - inferred by curator
inferred by sequence analysis
RCANDUFAB1
- inferred from reviewed computational analysis
IS* - inferred from sequence*
IEA - inferred from electronic annotation
Other
NR - not recorded (historical)
ND - no biological data available
Sequence analysis
• rapid (computational)
• “breadth” of coverage
• less detailed
ISS - inferred from sequence or structural similarity
ISA - inferred from sequence alignment
ISO - inferred from sequence orthology
ISM - inferred from sequence model
Unknown Function vs No GO

ND – no data




Biocurators have tried to add GO but there is
no functional data available
Previously: “process_unknown”,
“function_unknown”, “component_unknown”
Now: “biological process”, “molecular function”,
“cellular component”
No annotations (including no “ND”):
biocurators have not annotated

this is important for your dataset: what % has
GO?
Using the GO
Using the GO
Decide on GO analysis tool
 How much GO is available for your
species?
 Getting GO for you data set
 Adding GO for your data

http://www.geneontology.org/
However….
 many of these tools do not support non-model
organisms
 the tools have different computing requirements
 may be difficult to determine how up-to-date the
GO annotations are…
Need to evaluate tools for your system.
Evaluating GO tools
Some criteria for evaluating GO Tools:
1. Does it include my species of interest (or do I have to
“humanize” my list)?
2. What does it require to set up (computer usage/online)
3. What was the source for the GO (primary or secondary)
and when was it last updated?
4. Does it report the GO evidence codes (and is IEA
included)?
5. Does it report which of my gene products has no GO?
6. Does it report both over/under represented GO groups and
how does it evaluate this?
7. Does it allow me to add my own GO annotations?
8. Does it represent my results in a way that facilitates
discovery?
Some useful expression analysis tools:
Database for Annotation, Visualization and
Integrated Discovery (DAVID)


http://david.abcc.ncifcrf.gov/
AgriGO -- GO Analysis Toolkit and Database for
Agricultural Community





http://bioinfo.cau.edu.cn/agriGO/
used to be EasyGO
chicken, cow, pig, mouse, cereals, dicots
includes Plant Ontology (PO) analysis
Onto-Express



http://vortex.cs.wayne.edu/projects.htm#Onto-Express
can provide your own gene association file
Funcassociate 2.0: The Gene Set Functionator



http://llama.med.harvard.edu/funcassociate/
can provide your own gene association file
Functional Modeling Considerations

Should I add my own GO?




Should I do GO analysis and pathway analysis and network
analysis?



use GOProfiler to see how much GO is available for your species
use GORetriever to find existing GO for your dataset
Does analysis tool allow me to add my own GO?
different functional modeling methods show different aspects about
your data (complementary)
is this type of data available for your species (or a close ortholog)?
What tools should I use?



which tools have data for your species of interest?
what type of accessions are accepted?
availability (commercial and freely available)
Overview of Functional Modeling Strategy
Microarray Ids
ArrayIDer
Protein/Gene
identifiers
GOModeler
hypothesis testing
Pathways and
network analysis
Ingenuity Pathways Analysis (IPA)
Pathway Studio
Cytoscape
DAVID
GO Enrichment
analysis
GORetriever
Genes/Proteins with GO annotations
no GO annotations
GOanna
Ingenuity Pathways Analysis (IPA)
Pathway Studio
Cytoscape
DAVID
EasyGO/AgriGO
Onto-Express
Onto-Express-to-go (OE2GO)
GOSlimViewer
summarizes
GO function
Yellow boxes represent AgBase tools
Green/Purple boxes are non-AgBase resources
For more information about GO

GO Evidence Codes:
http://www.geneontology.org/GO.evidence.shtml

gene association file information:
http://www.geneontology.org/GO.format.annotation.shtml

tools that use the GO:
http://www.geneontology.org/GO.tools.shtml

GO Consortium wiki:
http://wiki.geneontology.org/index.php/Main_Page
All websites are listed on the
AgBase workshop website.