Transcript Annotations

Introduction
This presentation is designed to show the features of four ‘third-party’
GO analysis tools. These tools and others listed on
http://www.geneontology.org/GO.tools.shtml#micro can be used in
proteomics studies to view GO terms associated with a list of proteins
obtained from high-throughput experiments and their statistical significance
compared with a reference set of proteins.*
Each presentation was prepared by the developers of the tools, using
for the analysis a list of human cardiovascular-related protein accessions
(or in the case of Blast2GO, the equivalent bovine protein sequences).
*All of these tools have been created outside of the GO Consortium. The articles authors do not intend
to recommend any tool, merely demonstrate how GO analysis of proteome sets could be performed using
some of these tools. We advise researchers to try several different tools to find one which suits their needs.
Contents
Blast2GO
Slide 4
FatiGO
Slide 13
Onto-Express
Slide 20
Ontologizer
Slide 27
Accession list I
Slide 35
Accession list II
Slide 36
Blast2GO in Babelomics
http://babelomics.bioinfo.cipf.es
Functional Annotation: First, the BLAST step to obtain
the homologue sequences for the query sequences.
Second, the actual GO annotation by applying the
Blast2GO method which, basically, transfers the most
confident and appropriate GO annotations to the novel
sequences. Statistical charts help here to understand
and interpret the annotation results.
Visualization: This step allows the users to get an
overall idea of the assigned GO annotations of the
sequence dataset making use of GO's graph structure.
Bioinformatics Department
Centro de Investigación Príncipe Felipe (CIPF)
[email protected]
Conesa, A., Götz, S., García-Gómez, J.M., Terol, J., Talón, M. & Robles,
M. (2005). Blast2GO: A universal tool for annotation, visualization and
analysis in functional genomics research. Bioinformatics 21: 3674-3676
Functional Annotation with Blast2GO
Annotation is the process of assigning functional categories to gene or gene products. In
Blast2GO this assignment is performed for each sequence based on the information available for
the homologous sequences retrieved by BLAST. Blast2GO annotation proceeds through a 2 step
strategy:
1. All GO terms for the BLAST hit sequences are collected
For the first step, BLAST results are parsed and the identifiers of the BLAST hits are found and
used to query the Gene Ontology database to recover associated functional terms. Also the
evidence code of each particular annotation is recovered. The evidence codes indicate how the
functional assignment in the Gene Ontology database has been obtained.
2. GO terms are selected from this original pool to extract the most reliable annotation
Once all this information is gathered, an annotation score is computed for each {GO,Query
Sequence} pair. Only the most specific GO term within a branch of the GO is assigned to the
query sequence, and this assignment is dependent on the 'annotation score', the threshold for
which is preset by the user. The annotation score is computed as:
Annotation score{GO, Seq} = (max.sim * ECw) + (#GO-1 * GOw)
where:
 max.sim: is the maximal value of similarity between the query and hit sequences that have the
given GO annotation
 ECw: is the weight given to the Evidence Code of the original annotation. Blast2GO has defined
values for these weights, which can also be modified by the user. In general, ECw = 1 for
experimental evidence codes and ECw < 1 for non-experimental evidence codes.
 #GO: is the number of annotated children terms
 GOw: is the weight given to the contribution of annotated children term to a given term
The BLAST Step (1/2)
In this tab you can see the
actual status of your job and
for big datasets come back
later to retrieve the results.
Upload your sequence file in FASTA
format, choose the appropriate BLAST
parameters and database (blastp for
protein sequences) and press RUN
The homology search is the first and most time consuming step when attempting to transfer
functional information from similar sequences to uncharacterized sequence data. This simple
tool gives you the option to perform high-throughput BLAST searches against several protein
databases, keep processes running until they are finished monitoring its actual status and
saving the generated alignments as XML file. These XML-files can than be used as input data
for the Blast2GO annotation method.
The BLAST Step (2/2)
Save your results
as an XML file.
Open the results with this link
The Annotation Step
Upload and parse your
BLAST results in NCBI's
XML format applying
several filters
Annotation rule parameters:
 e-Value cut-off as minimal quality criteria
 annotation rule cut-off (coverage vs. exactness) GOWeight (more general vs. more specific terms)
 define a minimal alignment length allowed for function
transfer
Evidence code weights can be set
to in/decrease the influence of
different kinds of annotation
evidence e.g. automatically
generated source annotation
Start the annotation
assignment
The Blast2GO web tool generates a multitude of statistical charts to
understand the underlying dataset and to better interpret the generated
annotation results
The result table to browse and export the
generated annotations

A chart showing the e-value distribution
of the BLAST results

A chart showing from which source
databases the transferred GO terms were
originally coming from

review
browse
export
A chart showing how many GO terms
were assigned to how many sequences
A chart showing the distribution of the different
evidence codes throughout the GO terms per
BLAST hit


A chart showing the distribution of the
different evidence codes throughout the
GO terms per sequence


A chart showing the most frequent GO
terms throughout the dataset
A chart showing the distribution of the different
species from which the BLAST hits originate

A chart showing the success of the
annotations process giving the number of
successfully ‘BLASTed’, GO-mapped and
annotated sequences
A chart showing the number of sequences
annotated at a certain GO level and category


A chart showing the distribution of BLAST
sequence similarities

Saving and exporting results
Blast2GO annotations are
exported in a tabular format:
SeqId<tab>GOterm<tab>SeqDesc
Browse the generated
annotations in the
result table
Open and save the results in a
tabular format for further use
in the GO-Graph-Viewer or as
download data in Blast2GO
project format for direct import
into Blast2GO
Visualization: The GO-Graph-Viewer
The DAG viewer tool generates joined Gene Ontology graphs (DAGs) to create overviews of
the functional context of groups of sequences. Interactive graph visualization allows the
navigation of large and unwieldy graphs often generated when trying to biologically explore
large sets of sequence annotations. Zoom and graph navigation is provided through the DAG
viewer Java Web Start tool.
Save parts of your graphs
in high resolution images to
better communicate your
results
Upload your Blast2GO
generated annotations
Start the interactive
graph visualization tool
with Java Web Start
Define graph filtering
parameters for more dense
and informative graphs
FatiGO
Functional enrichment analysis
Bioinformatics Department
Centro de Investigación Príncipe Felipe (CIPF)
http://www.fatigo.org
http://www.babelomics.org
[email protected]
Al-Shahrour, F., et al. (2005), Babelomics: a suite of web-tools for functional annotation and
analysis of group of genes in high-throughput experiments, Nucleic Acids Research, 33, W460-W464
Al-Shahrour, F., et al. (2004), FatiGO: a web tool for finding significant associations of
Gene Ontology terms with groups of genes, Bioinformatics, 20, 578-580
Select your
organism
*Several types of identifier are acceptable,
such as UniProtKB, Ensembl IDs, HGNC
symbols, RefSeq, Entrez Gene etc.
Enter your list or file
of genes/proteins*
In this example, list #1 is a list of BHF-UCL
annotated cardiovascular-related proteins
(see Slide 35) and list #2 is the “Rest of genome”
Select the
database(s) you
want to query
Click options to
filter the database
(optional)
Filter Tool
Use the level of the DAG
and the evidence code
as filtering criteria
Select subsets of
annotations based on
keywords and on the
size of the gene module
Babelomics allows for sub-selection of gene annotations, in which gene modules are based, in
order to test hypotheses in a more focused and sensitive manner.
 Removing from the analysis modules whose testing is unnecessary and superfluous increases
the power of the tests in the multiple-testing adjustment step.

Results of GO analysis
Level 3 is lessgranular terms.
Level 9 is moregranular terms.
The number of annotated
proteins per GO level is
displayed
Low p-value =
more significant
The proteins from your
query set that are
annotated to each GO
term are listed
FatiGO returns a list of GO terms which are over-represented in the list of interest, in this case the BHF-UCL list.
For Biological Process terms at level 3 of the ontology, the terms that are over-represented in the
BHF-UCL list include muscle contraction, cell cycle and anatomical structure development.
Best p-value
FatiGO shows terms deeper in the ontology, at level 6, which are over-represented in the BHF-UCL list (but
not necessarily significantly – compare p-values) such as regulation of progression through cell cycle, heart
development and cholesterol absorption. These are all processes you would expect cardiovascular-related
proteins to be involved in.
GO-Graph-Viewer Tool
You can upload your FatiGO results to the interactive graph visualization tool
The DAG viewer tool allows visualization of the significant GO terms as a GO graph.
The GO term names are displayed together with the annotation score.
Onto-Express Features at a Glance
http://vortex.cs.wayne.edu/projects.htm#Onto-Express
Purvesh Khatri ([email protected])
Sorin Draghici ([email protected])
Intelligent Systems and Bioinformatics Lab
Department of Computer Science
Wayne State University
Input interface
Select type of
IDs in input file
Choose a statistical
distribution from:
1. hypergeometric
2. binomial
3. chi-square
Select
organism
Choose from
more than 300
microarrays.
If an array of choice
is not available, use
your own reference.
Choose a correction for
multiple hypotheses from:
1. Bonferroni, 2. FDR,
3. Holm, 4. Sidak

Supported input types are GenBank accession numbers, UniGene
cluster IDs, Entrez Gene IDs, gene symbols, Affymetrix probe IDs,
any of the IDs used in GO database.
Results – Flat view
Results – tree view
• Choose a level to expand the GO tree and click “Expand” button.
• Only the GO terms with at least one input gene are displayed in the tree.
Results – chromosome view
• Chromosome information is supported for human, mouse and rat. It
displays number of genes on each chromosome and their positions.
• Clicking on “NCBI Genome view” links out to NCBI Mapviewer.
Results – single gene view

Selecting “show in gene view” in the tree view displays the annotations
for the selected gene in the GO hierarchy in the single gene view.
References
• Purvesh Khatri, Sorin Draghici, G. Charles Ostermeier, Stephen A.
Krawetz. Profiling Gene Expression Using Onto-Express. Genomics,
79(2):266-270, February 2002.
• Sorin Draghici, Purvesh Khatri, Rui P. Martins, G. Charles Ostermeier
and Stephen A.Krawetz. Global functional profiling of gene expression.
Genomics 81(2):98-104, February 2003.
• Purvesh Khatri and Sorin Draghici. Ontological analysis of gene
expression data: current tools, limitations, and open problems.
Bioinformatics, 21(18):3587-95, September 2005.
• http://vortex.cs.wayne.edu/projects.htm.
Ontologizer
Ontologizer Open Source Team
http://compbio.charite.de/ontologizer/
located at
Institute for Medical Genetics
Charité Universitätsmedizin Berlin
Grossman S., Bauer S., Robinson P.N., Vingron M. Improved detection of overrepresentation of
Gene Ontology annotations with parent child analysis. Bioinformatics. 2007 Nov 15;23(22):3024-31.
Robinson P.N., Wollstein A., Böhme U., Beattie B. Ontologizing gene-expression microarray data:
characterizing clusters with Gene Ontology. Bioinformatics. 2004 Apr 12;20(6):979-81.
Ontologizer – Setting up a Project
Inputs: • Ontology, defines the GO structure
• Annotations, map genes to GO terms
There are several predefined
entries for various settings…
…or you may specify
the fields manually.
Ontologizer – Editing Sets of Identifiers
Annotated identifiers are
highlighted on the fly.
Mouse hovering reveals
direct annotations.
No annotation for this one
The induced graph of these
terms can be displayed.
Ontologizer – Overview
Of interest here are two lists of
identifiers – study and population.*
Choose analysis
method; parent-child
takes account of the
ontology structure,
term-for-term treats
each term
independently.
But multiple projects may
reside in the workspace.
*In this example the study list is a list of BHF-UCL annotated cardiovascular-related proteins (see Slide 35)
and the population list is a random list of human UniProtKB accessions.
Ontologizer – Results
A list of terms is displayed.
The shading indicates
significance – darker
shading is more significant.
Click on a term to display
its position in the
ontology, definition and
the proteins annotated to
it and its parents.
Ontologizer – Graphical View of Results
Yellow = Molecular Function
Pink = Cellular Component
Green = Biological Process
The term highlighted in the
table will also be highlighted
red in the graph.
Ontologizer – What Else?
• Can be easily invoked from the Web.
• Input files can be located remotely.
• Several procedures of multiple testing correction
are supported.
• Results can be filtered and stored in a tabular as
well as in a graphical fashion.
• A command line version is available.
Acknowledgments
The authors wish to thank the developers of the tools for preparing
these presentations as follows;
• Blast2GO
Stefan Götz
• FatiGO
Fatima Al-Shahrour
• Onto-Express
Sorin Draghici and Purvesh Khatri
• Ontologizer
Sebastian Bauer and Peter Robinson
List of human UniProtKB accessions used in FatiGO, Onto-Express
and Ontologizer analyses
O00273
P04180
P12643
P35226
P55290
Q8N726
O60543
P05231
P12829
P36897
P61812
Q8TBM5
O75955
P05976
P12830
P37173
P84022
Q92673
O95477
P06727
P13501
P38936
Q00534
Q96AB3
P00519
P06741
P16519
P40337
Q00872
Q96N67
P01127
P06858
P17947
P42684
Q01449
Q9BQE4
P01137
P07203
P18510
P42771
Q13485
Q9H172
P01375
P08590
P22301
P42772
Q14114
Q9H1R3
P01584
P09493
P24385
P42773
Q14896
Q9H221
P02647
P09958
P25098
P45379
Q15796
Q9H222
P02649
P10253
P25103
P45844
Q16665
Q9HC96
P02652
P10636
P29120
P46527
Q5JRA6
Q9UKX2
P02655
P10916
P30279
P49918
Q6PGN9
Q9UNQ0
P02656
P11597
P30281
P50150
Q6Q788
Q9UPY8
P04114
P11802
P34947
P55273
Q86Y82
Q9Y5C1
Q9Y623
List of bovine UniProtKB accessions used in Blast2GO analysis
A0JNJ5
P09428
Q06599
Q2KIW4
Q4GZT4
A1A3Z1
P11151
Q08DE0
Q2KJB3
Q4TTZ1
A4FUX1
P13789
Q0P5D3
Q2KJD8
Q4ZJV8
A4FUZ9
P15497
Q0VC16
Q2KJD8
Q4ZJV9
A4IFM7
P18341
Q0VC37
Q2TBI0
Q58D48
A5PJI9
P19034
Q0VD56
Q32KX0
Q5E9I5
A5PKM2
P19035
Q1HE26
Q32KX7
Q5KR49
A6QLS3
P21146
Q1RMM7
Q32KY4
Q6R8F2
A6QP89
P21214
Q1W668
Q32PJ1
Q9BE40
A7MBB9
P26892
Q24JY8
Q32PJ2
Q9BE41
O46680
P43249
Q28193
Q3B7N0
Q9GLR0
O77482
P43480
Q29RJ9
Q3MHH5
Q9GLR1
O97919
P81644
Q29RV0
Q3SYR3
Q9MYM4
P00435
P85100
Q2KI22
Q3SZE5
Q9XTA5
P05363
Q03247
Q2KI76
Q3SZE5