Transcript Getting GO.

Getting GO: how to get GO
for functional modeling
Iowa State Workshop
11 June 2009
1.
GO for your species

2.
GOProfiler: summarizing the available GO
GO browsers

QuickGO from EBI
 AmiGO from the GO Consortium
3.
4.
5.
6.
7.
gene association files
getting GO for your dataset
adding more GO
requesting GO
GO based tools for functional modeling
1. GO for your
species
GOProfiler
GOProfiler allows you get an overview of
what GO annotation exists for the species
you are interested in.
Number of proteins is based upon
UniProtKB records for these species.
 Species with only IEA annotations do not
have an active GO annotation project

 GO
provided automatically by EBI GOA
Project.
2. GO Browsers
Use GO Browsers for:
searching for GO terms
 searching for gene product annotation
 filtering sets of annotations and
downloading results
 creating/using GO slims

GO Browsers

QuickGO Browser (EBI GOA Project)
 http://www.ebi.ac.uk/ego/
 Can
search by GO Term or by UniProt ID
 Includes IEA annotations

AmiGO Browser (GO Consortium Project)
 http://amigo.geneontology.org/cgi-
bin/amigo/go.cgi
 Can search by GO Term or by UniProt ID
 Does not include IEA annotations
More information about these
tools is available from the online
workshop resources.
3. gene association
files
The gene association (ga) file


standard file format used to capture GO annotation
data
tab-delimited file containing 15* fields of information:
 Information
about the gene product (database, accession,
name, symbol, synonyms, species)

information about the function:
 GO
ID, ontology, reference, evidence, qualifiers, context
(with/from)

data about the functional annotation
 date,
annotator
* 2 additional fields will soon be added to capture
information about isoforms and other ontologies.
(additional column
added to this
example)
gene product information
metadata: when & who
function information
Gene association files

GO Consortium ga files
 many organism specific files
 also includes EBI GOA files

EBI GOA ga files
 UniProt
file contains GO annotation for all species
represented in UniProtKB

AgBase ga files
 organism specific files
 AgBase GOC file – submitted
to GO Consortium & EBI
GOA
 AgBase Community file – GO annotations not yet
submitted or not supported
 all files are quality checked
4. Finding GO
for your dataset
The AgBase GO annotation tools can
be used separately or can be
combined to rapidly provide an
annotation file for functional
modeling tools.
GORetriever



Allows you to get GO annotations for a specific set
of gene products.
Accepts a text file of UniProt accessions or IDs or
gi numbers.
Returns GO annotations, list of accessions that had
no GO and a GO Summary file.
GORetriever Results
GORetriever Results
GORetriever Results
save as text file
For GOSlimViewer
GORetriever Results
But what about IDs not supported by GORetriever?
5. Adding GO to
your dataset
only returns existing GO
 only accepts limited accession types

GOanna does a Blast search against existing
GO annotated products.
 allows you to quickly transfer GO to gene
products where they have similar sequences
(ISS)
 accepts fasta files

GOanna
GOanna Results
query IDs are hyperlinked to
BLAST data
(files must be in the same
directory)
1. Manually inspect alignments and delete any lines where there is not a good
alignment*.
2. Add this additional annotation to the annotations from GORetriever.
*WHAT IS A GOOD ALIGNMENT?
GOanna2ga
New to AgBase: an online script to convert your GOanna file to a gene
association file format.
• Allows you to add manually checked GOanna annotations to a GORetriever
file.
• Link is available from the workshop resources.
6. Requesting GO
7. GO based tools for
biological modeling
GOSlimViewer: summarizing results
response to stimulus
amino acid and derivative metabolic process
transport
behavior
cell differentiation
metabolic process
regulation of biological process
cell communication
nucleobase, nucleoside, nucleotide and nucleic acid metabolic process
cell death
??
cell motility
macromolecule metabolic process
multicellular organismal development
catabolic process
biological_process
“process unknown”
“function unknown”
“component unknown”
http://www.geneontology.org/
However….
 many of these tools do not support agricultural
species
 the tools have different computing requirements
 Tools for GO analysis of gene
expression/microarray data
A list of these tools that can be used for
agricultural species is available on the workshop
website at the Expression analysis tools at the
GO consortium website link.
Evaluating GO tools
Some criteria for evaluating GO Tools:
1. Does it include my species of interest (or do I have to
“humanize” my list)?
2. What does it require to set up (computer usage/online)
3. What was the source for the GO (primary or secondary) and
when was it last updated?
4. Does it report the GO evidence codes (and is IEA included)?
5. Does it report which of my gene products has no GO?
6. Does it report both over/under represented GO groups and
how does it evaluate this?
7. Does it allow me to add my own GO annotations?
8. Does it represent my results in a way that facilitates
discovery?