Using Gene Ontology - Center for Genomic Sciences
Download
Report
Transcript Using Gene Ontology - Center for Genomic Sciences
Using Gene
Ontology
Models and Tests
Mark Reimers, NCI
Outline
What we might gain by using annotations
Models for group effects
Enrichment of selected genes
Chi-square and Fisher test
Group scores
Overlap in hierarchical annotations
Why Use Annotations
Goal: How to identify biological processes or biochemical
pathways that are changed by treatment
Common procedure: select ‘changed’ genes, and look
for members of known function
Problem: moderate changes in many genes
simultaneously will escape detection
New approach: start with a vocabulary of known GO
categories or pathways, and look for coherent changes
Variations: look for chromosome locations, or protein
domains, that are common among many genes that are
changed
Statistical Methods
How likely is it that the set of ‘significant’
genes will include as many from the
Category
Others
category, as you see?
112
Two-way table:
On list 8
Fisher Exact test
Not o n list 42
12,500
handles small categories better
How to deal with multiple categories?
GoMiner: Leverages the Gene Ontology
(Zeeberg, et al., Genome Biology 4: R28, 2002)
P-values for Tests
About 3,000 GO biological process
categories
Most overlap with some others
p-values for categories are not
independent
Permutation test of all categories
simultaneously in parallel
Gene Set Expression Analysis
Ignore for the moment the
‘meaning’ of the p-value:
consider it just as a ranking of
S/N
If we select a set of genes ‘at
random’, then the ranking of
S/N ratios should be random
between group difference
relative to within-group
ie. a sample from a uniform
distribution
Adapt standard (K-S) test of
distribution
Continuous Tests
Model: all genes in group contribute
roughly equally to effect
Test: zG sg for each group G
g G
Compare z to permutation distribution
More sensitive under model assumptions