Transcript Slide 1

Fission Yeast Computing Workshop
Exercise 5: Looking for overreprsented GO terms
in a gene set using Onto-Express
GO annotations can be used to obtain functional data about gene sets e.g. from
gene expression experiments.
There are several tools available to perform this sort of analysis, all developed by
groups outside of the GO Consortium. They work in a similar way: a full gene
set is uploaded, with a subset of all ‘interesting’ genes, usually those that have
been up- or down-regulated in an expression experiment. The tool then
determines which GO categories have been enriched for ‘interesting’ genes, and
provides some sort of statistical measure to guard against GO categories that
appear by chance alone. For a full list of these tools, see:
http://www.geneontology.org/GO.tools.annotation.shtml.
This tutorial will be using one such tool, Onto-Express, one of a package of
microarray tools, Onto-Tools, developed at the Intelligent Systems and
Bioinformatics Laboratory, Wayne State University (http://vortex.cs.wayne.edu/).
First you need to obtain an Onto-Express login. Go to:
http://vortex.cs.wayne.edu/ontoexpress/servlet/UserInfo
and fill in the details. Your password will be emailed
straight to you. Once your password has arrived, go to to
the login page:
http://vortex.cs.wayne.edu/ontoexpress/
fill in your login details and click ‘Submit’. You will see a
security pop-up, choose ‘Grant Always’. A second pop-up
will appear:
Choose Onto-Express and click Run.
-1-
Fission Yeast Computing Workshop
Note: You will need to leave the original
browser window open for the whole of your
session.
We have provided some test microarray data for this tutorial; to
download it, go to:
Tip: to download the files, right-click and
choose ‘Save Link As’. Remember where you
http://www.ebi.ac.uk/~jane/pombe/
saved it to!
and download both files (sp_changed.txt and sp_total.txt).
The input file format is a simple text file with either
accession numbers, cluster identifiers or probe identifiers
(but not a combination), each listed on a separate line.
Return to your input window, which should look like this:
Click the ‘Input File’ button and browse for the
file sp_changed.txt that you’ve just downloaded.
This file contains a list of genes that were underor over-expressed in the experiment.
If you used a commercial chip in your
experiment, you can choose this from the
Reference Array list rather than uploading
your own reference file.
Now choose ‘schizosaccharomyces pombe’ from
the Organism menu, and for Input Type choose
‘sanger genedb’; this chooses the format of the
gene list.
From the Reference Array menu choose ‘My own
array’ and then click the Reference file button to
browse for the file sp_total.txt. This file contains
a list of all of the genes on your chip. Leave all
the other setting as they are and click ‘Submit’.
-2-
Fission Yeast Computing Workshop
After a minute or so, a results page will
appear:
From the Display options in the top right, choose
Display ‘Biological Process’ and Sort by ‘Total’.
Total is the number of genes associated with the
GO term.
Q: Which GO biological process term has the most
genes associated with it? Is the over-representation
of this term statistically significant?
Hint: To switch between which ontology is
Q: Which molecular function and cellular
shown in the flat view, use the Display options.
component GO categories have the most genes
associated with them?
Click the ‘Tree View’ tab. The number in bold following the term is the number of genesfrom
the ‘interesting’ subset that are annotated to this term, and its child terms, in the same way as
AmiGO. Open the nodes ‘biological process’ -> ‘development’ -> ‘morphogenesis’ ->
‘cellular morphogenesis’ -> ‘extablishment and/or maintenance of cell polarity’.
You will see a list of genes annotated directly to ‘extablishment and/or maintenance of cell
polarity’. Clicking on the gene names gives you the option of viewing the gene details. Now
click the ‘Syncronized View’ tab. This shows you a flat view of all open nodes in the tree
view.
Q: Which is the only GO category that has a significantly different number of genes
associated than you would expect? Why do you think this is?
Q: What processes, functions and cellular components seem to be associated with these
microarray data?
-3-