Using Gene Ontology - Center for Genomic Sciences

Download Report

Transcript Using Gene Ontology - Center for Genomic Sciences

Using Gene
Ontology
Models and Tests
Mark Reimers, NCI
Outline






What we might gain by using annotations
Models for group effects
Enrichment of selected genes
Chi-square and Fisher test
Group scores
Overlap in hierarchical annotations
Why Use Annotations





Goal: How to identify biological processes or biochemical
pathways that are changed by treatment
Common procedure: select ‘changed’ genes, and look
for members of known function
Problem: moderate changes in many genes
simultaneously will escape detection
New approach: start with a vocabulary of known GO
categories or pathways, and look for coherent changes
Variations: look for chromosome locations, or protein
domains, that are common among many genes that are
changed
Statistical Methods



How likely is it that the set of ‘significant’
genes will include as many from the
Category
Others
category, as you see?
112
Two-way table:
On list 8
Fisher Exact test
Not o n list 42
12,500


handles small categories better
How to deal with multiple categories?
GoMiner: Leverages the Gene Ontology
(Zeeberg, et al., Genome Biology 4: R28, 2002)
P-values for Tests




About 3,000 GO biological process
categories
Most overlap with some others
p-values for categories are not
independent
Permutation test of all categories
simultaneously in parallel
Gene Set Expression Analysis

Ignore for the moment the
‘meaning’ of the p-value:
consider it just as a ranking of
S/N


If we select a set of genes ‘at
random’, then the ranking of
S/N ratios should be random


between group difference
relative to within-group
ie. a sample from a uniform
distribution
Adapt standard (K-S) test of
distribution
Continuous Tests


Model: all genes in group contribute
roughly equally to effect
Test: zG   sg for each group G
g G



Compare z to permutation distribution
More sensitive under model assumptions