one-sided Fisher`s exact test

Download Report

Transcript one-sided Fisher`s exact test

Gene Expression Data Analysis
Lab Session
CAD course
Jian Li
01.28. 2011
Gene expression signatures
• Will be loosely defined here to mean a set
of genes that are functionally associated
with each other in some way.
• When using expression profiling to define
genes, a gene expression signature
consists of two things:
– A set of genes going “up” (relative to
something).
– A set of genes going “down” (relative to
something).
Gene expression profiling of IGF-I-stimulated MCF-7 cells
MYC
Ras
Five oncogenic pathway signatures in human cancers
E2F3
b-cat
Src
(1)
One combined signature
(3,4)
compare
(2)
5 signatures
• Course webpage
Excel functions/features you will
need for the computational
exercise
TTEST
TTEST(array1,array2,tails,type)

array1
is the first data set.

array2
is the second data set.

tails
specifies the # of distribution tails
(Use “2”)

type
is the kind of t-Test to perform
(Use “2”).
AVERAGE
AVERAGE(number1, number2)
• Number1, number2,
... are 1 to 30
numeric arguments
for which you want
the average.
• The arguments must
either be numbers
or be names, arrays,
or references that
contain numbers.
Data > Filter > AutoFilter



arrows appear to the right
of the column labels
filtered items appear in
blue.
complex criteria:
rows that contain values
within a specific range (e.g.
p<0.01)
MATCH
MATCH(lookup_value,lookup_array,match_type)

lookup_value what value are you looking for?

Lookup_array range of cells

match_type
should be 0 for our purposes.
(Don’t forget the
$)
COUNT
COUNT(range)

range

cells to count
Only numbers in a range are counted. Empty
cells, logical values, text, or error values in the
array or reference are ignored.
Compare two signatures

Sig A: 1152

Sig B: 119


Genes on both
platforms: 11079
Genes shared by both
gene signatures: 44
one-sided Fisher's exact test
R function for one-sided Fisher's exact test
dhyper
• Example:
– 100 balls
– 10 of the balls are red
– I grab 20 balls
– Five of my 20 balls are red
• Was the number of red balls I selected a
significant number ?
> m<-10
#number of red balls
> n<-90
#number of other balls (total pop-m)
> k<-20
#number of balls selected
> x<-0:k
#vector of successes
> 1-sum(dhyper(x,m,n,k)[1:5])
[1] 0.02546455
R function for one-sided Fisher's exact test
dhyper


Sig A: 1162
Sig B: 119
Genes on both platforms: 11079
Genes shared by both gene signatures: 44
>
>
>
>
>
m<-119
#number of
n<-11079-119
#number of
k<-1162
#number of
x<-0:k
#vector of
1-sum(dhyper(x,m,n,k)[1:44])
[1] 1.265654e-14
Sig B genes
other genes
Sig A genes
successes
GSEA (rank-based) enrichment
analysis
All the genes in
the dataset are
used here
Subramanian, Aravind et al. (2005) Proc. Natl. Acad. Sci. USA 102, 15545-15550
• Start from the top of the Ranked list.
• Add points to “Random walk” for each gene you find in S.
• Remove points from “Random walk” for each gene not in S.
GSEA (rank-based) enrichment
analysis
assign nominal
P value
step 1
step 2
status/result
GSEA (rank-based) enrichment
analysis
(1)
(3)
(2)
All the genes in
the dataset are
used here
Subramanian, Aravind et al. (2005) Proc. Natl. Acad. Sci. USA 102, 15545-15550
• Start from the top of the Ranked list.
• Add points to “Random walk” for each gene you find in S.
• Remove points from “Random walk” for each gene not in S.
Ranked-based enrichment
analysis
Rank
ordered
genes from
dataset X
Locations
of genes
from set Y
• Rank-based approaches use all of the genes
from one of the datasets to determine
enrichment (does not make a “cut”).