Working with enriched gene sets in R

Download Report

Transcript Working with enriched gene sets in R

Working with enriched gene sets in R
Peter Svensson
Micheline Giphart-Gassler
Harry Vrieling
P-values of genes
• Starting with a vector of p-values from
– t.test(irradiated, control)
– wilcoxon(irradiated, control)
– lm(formula, data)
Distribution of p-values
• two-tailed
Distribution of p-values
• one-tailed
Distribution of p-values
• Proportion of
unchanged genes, π0
library(qvalue)
• (Storey&Tibshirani 2001)
qvalue(pvals)$pi0
Annotation
• Anntotation of the genes
available from
Bioconductor
– MetaData for commercial
arrays
– AnnBuilder for homemade
– Unigene name, code,
symbol, entrez gene, GO
terms, KEGG pathways,
Pubmed ids...
Gene Set Enrichment Analysis
• Mootha et al, Nat Genet.
2003, 34:267
• Use the gene sets that are
made by GO terms, KEGG
terms, name containing
’kinase’, genes that cluster
together
• Make a vector of
– all not in group -sqrt(G/(N-G))
– all in group sqrt(N-G/G)
Running sum
• The sum of the values in vector will be 0
• Plot the running sum:
• The peak is at a point at p=0.1
GSEA
• The enrichment score
can be used to
determine the
importance of gene
set.
• Permutation technique
to get significance.
Hypergeometric probability
• Used in dChip and DAVID.
• Input is
– # genes in the gene set (n), # genes on array (n+m)
– # selected genes in the gene set (x), # selected genes
(N)
• dhyper() gives the density
Selecting genes
• Have to set a threshold, p0, for
the p-values. p < p0 selected
• p0 = 0.001 is not informative
• p0 = 0.1
• at the maximum of the peak
• dissect(pvals)
– (BMC Bioinformatics, to appear)
• Will get a p-value
• Tested 4000 GO terms, need for
correction for multiple testing
p.adjust(pvals,”fdr”)
• Look at significant terms, p<0.001
Cisplatin data
• Mouse embryonic stem cells exposed to
various doses (low, medium and high).
Harvested at 0<t<24
• Low doses, early time points
– Few genes changed
– Few pathways changed
• Indications of what will come
Preprocessing
• For internal use at
www.medgencentre.nl/pla
• Not updated
• Code for working with widgets,
definining MIAME-compliant
object, AffyBatch (exprSet),
doing tests, building linear
models, correlation tests, GSEA
• Updating together with Agata
Meglicz. It will be improved soon.
Demonstration
cdf=“hgu133a”
source(“gsea.R”)
gsea()
dissectGUI()