From Functional Genomics to Physiological Model: the

Download Report

Transcript From Functional Genomics to Physiological Model: the

From Functional Genomics
to Physiological Model:
Using the Gene Ontology
Fiona McCarthy, Shane Burgess, Susan Bridges
The AgBase Databases, Institute of Digital Biology,
Mississippi State University
From Functional Genomics to
Physiological Model
1.
2.
3.
4.
5.
A user’s guide to the Gene Ontology
(GO)
Finding GO for farm animal species
Adding GO to your dataset
GO based tools for biological modeling
Examples: using GO for biological
modeling
• Presentation available at AgBase
• Websites available as handout
1. A User’s Guide
to GO
What is the Gene Ontology?
Emily Dimmer, GOA EBI:
“a controlled vocabulary that can be applied to all organisms
even as knowledge of gene and protein roles in cells is
accumulating and changing”
assign functions to gene products at different levels,
depending on how much is known about a gene product


is used for a diverse range of species

structured to be queried at different levels, eg:


find all the chicken gene products in the genome that are involved in
signal transduction
zoom in on all the receptor tyrosine kinases
human readable GO function has a digital tag to allow
computational analysis of large datasets

GO Mapping Example
NDUFAB1 (UniProt P52505)
Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa
Biological Process (BP or P)
GO:0006633 fatty acid biosynthetic process TAS
GO:0006120 mitochondrial electron transport, NADH to ubiquinone TAS
GO:0008610 lipid biosynthetic process IEA
NDUFAB1
Molecular Function (MF or F)
GO:0005504 fatty acid binding IDA
GO:0008137 NADH dehydrogenase (ubiquinone) activity TAS
GO:0016491 oxidoreductase activity TAS
GO:0000036 acyl carrier activity IEA
Cellular Component (CC or C)
GO:0005759 mitochondrial matrix IDA
GO:0005747 mitochondrial respiratory chain complex I IDA
GO:0005739 mitochondrion IEA
GO Mapping Example
NDUFAB1 (UniProt P52505)
Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa
GO:ID (unique)
aspect or ontology
Biological Process (BP or P)
GO:0006633 fatty acid biosynthetic process TAS
GO:0006120 mitochondrial electron transport, NADH to ubiquinone TAS
GO:0008610 lipid biosynthetic process IEA
NDUFAB1
GO term name
GO:0005504
GO:0008137
GO:0016491
GO:0000036
Molecular Function (MF or F)
fatty acid binding IDA
NADH dehydrogenase (ubiquinone) activity TAS
oxidoreductase activity TAS
acyl carrier activity IEA
Cellular Component (CC or C)
GO:0005759 mitochondrial
matrix IDA code
GO evidence
GO:0005747 mitochondrial respiratory chain complex I IDA
GO:0005739 mitochondrion IEA
GO EVIDENCE CODES
Direct Evidence Codes
GO
Mapping
IDA
- inferred
fromExample
direct assay
IEP
- inferred(UniProt
from expression
NDUFAB1
P52505)pattern
IGIBovine
- inferred
fromdehydrogenase
genetic interaction
NADH
(ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa
IMP - inferred from mutant phenotype
IPI - inferred from physical interaction
Biological Process (BP or P)
GO:0006633 fatty acid biosynthetic process TAS
Indirect Evidence Codes
GO:0006120 mitochondrial electron transport, NADH to ubiquinone TAS
inferred from literature
GO:0008610 lipid biosynthetic process IEA
IGC - inferred from genomic context
TAS - traceable author statement
Molecular Function (MF or F)
NAS - non-traceable author statement
GO:0005504 fatty acid binding IDA
IC - inferred by curator
GO:0008137 NADH dehydrogenase (ubiquinone) activity TAS
inferred by computational analysis
GO:0016491 oxidoreductase activity TAS
NDUFAB1
RCA - inferred from reviewed GO:0000036
computational
acylanalysis
carrier activity IEA
ISS - inferred from sequence or structural similarity
IEA - inferred from electronic annotation
Cellular Component (CC or C)
GO:0005759 mitochondrial matrix IDA
Other
GO:0005747 mitochondrial respiratory chain complex I IDA
NR - not recorded (historical) GO:0005739 mitochondrion IEA
ND - no biological data available
Unknown Function vs No GO

ND – no data
 Biocurators
have tried to add GO but there is
no functional data available
 Previously: “process_unknown”,
“function_unknown”, “component_unknown”
 Now: “biological process”, “molecular
function”, “cellular component”

No annotations (including no “ND”):
biocurators have not annotated
2. Finding GO
for Farm Animals
GO Browsers

QuickGO Browser (EBI GOA Project)
 http://www.ebi.ac.uk/ego/
 Can
search by GO Term or by UniProt ID
 Includes IEA annotations

AmiGO Browser (GO Consortium Project)
 http://amigo.geneontology.org/cgi-bin/amigo/go.cgi
 Can
search by GO Term or by UniProt ID
 Does not include IEA annotations
Getting GO

http://www.ebi.ac.uk/GOA/downloads.html
includes farm
animals
Getting GO

http://www.geneontology.org/GO.current.annotations.shtml#f
ilter
Getting GO

http://www.agbase.msstate.edu/
3. Adding GO to
your dataset
GO analysis of array data

Probe data is linked to gene product data
 gene,

cDNA, ESTs IDs
For some arrays, gene product data has
corresponding GO data
 available

Not all gene products will have GO annotation
 will

from vendor (updated?)
not be included in modeling
Need to get the maximum amount of GO data to
do biological modeling
Example: Netaffx
Secondary source of GO annotation
GORetriever
+ many more
GORetriever
GORetriever Results
GORetriever Results
GORetriever Results
save as text file
For GOSlimViewer
GORetriever Results
But what about IDs not supported by GORetriever?
GOanna
GOanna Results
query IDs are hyperlinked to
BLAST data
(files must be in the same
directory)
If there is a good alignment* to a protein with GO  transfer GO to your record
If there is not a good alignment or the record doesn’t have GO  literature
*WHAT IS A GOOD ALIGNMENT?
good
alignment
add to GO summary file
(tab-delimited text file containing
ID, GO:ID, aspect)
Contact AgBase to
request GO annotation of
specific gene products.
GOSlimViewer: summarizing results
GOSlimViewer results
response to stimulus
amino acid and derivative metabolic process
transport
behavior
cell differentiation
metabolic process
regulation of biological process
cell communication
nucleobase, nucleoside, nucleotide and nucleic acid metabolic process
cell death
cell motility
macromolecule metabolic process
multicellular organismal development
catabolic process
biological_process
response to stimulus
amino acid and derivative metabolic process
transport
behavior
cell differentiation
metabolic process
regulation of biological process
cell communication
nucleobase, nucleoside, nucleotide and nucleic acid metabolic process
cell death
??
cell motility
macromolecule metabolic process
multicellular organismal development
catabolic process
biological_process
“process unknown”
“function unknown”
“component unknown”
Looking at function, not genes
Pie Graphs – relative proportions
B-cells
apoptosis
immune response
cell-cell signaling
Stroma
GOModeler:
quantitative,
hypothesis-driven
modeling.
Coming soon
(contact AgBase)
GOModeler
McCarthy et al “AgBase:
a functional genomics
resource for agriculture.”
BMC Genomics. 2006 Sep
8;7:229.
4. GO based tools for
biological modeling
http://www.geneontology.org/
However….
 many of these tools do not support farm animal
species
 the tools have different computing requirements
 may be difficult to determine how up-to-date the
GO annotations are…
Need to evaluate tools for your system.
Evaluating GO tools
Some criteria for evaluating GO Tools:
1. Does it include my species of interest (or do I have to
“humanize” my list)?
2. What does it require to set up (computer usage/online)
3. What was the source for the GO (primary or secondary) and
when was it last updated?
4. Does it report the GO evidence codes (and is IEA included)?
5. Does it report which of my gene products has no GO?
6. Does it report both over/under represented GO groups and
how does it evaluate this?
7. Does it allow me to add my own GO annotations?
8. Does it represent my results in a way that facilitates
discovery?
5. Using GO for
biological modeling
Using GO for biological modeling:
hypothesis generating
 hypothesis driven
