motifs - BrainMass

Download Report

Transcript motifs - BrainMass

Finding Transcription Factor
Motifs
Adapted from a lab created by
Prof Terry Speed
Cell Cycle Data Set
Spellman et al. (1998). Comprehensive identification of cell cycleregulated genes of the yeast Saccharomyces cerevisiae by microarray
hybridization.
Synchronized population of yeast cells using three independent methods
(alpha factor arrest, elutriation, arrest of cdc15 temperature sensitivemutant).
Extracted RNA  microarray experiments to determine expression of
~6000 genes over 18 time points.
See http://cellcycle-www.stanford.edu
Outline
Read in cell cycle data into R.
Cluster cell cycle data using hierarchical clustering.
Visualize cell cycle clusters.
Find motifs in these clusters and visualize them using sequence logos.
Experimental Data
783 genes involved in the yeast cell cycle
Expression levels measured for 18 time points
Read the data into R:
> dat <- read.table("ccdata.txt", header=T, sep="\t")
Hierarchical Clustering
> distMat <- dist(dat)
> clustObj <- hclust(distMat)
> plot(clustObj)
Create Gene Expression Clusters
Let's cut the dendrogram into 16 clusters:
> cutObj <- cutree(clustObj, k=16)
> print(table(cutObj))
Write out the gene names in each cluster into a text file:
for( i in 1:16 ){
cluster.genes <- row.names(dat)[cutObj == i]
fileName <- paste("cluster", i, ".txt", sep="")
write(cluster.genes, fileName)
}
What Do These Clusters Look Like?
Let's plot the first 8 clusters:
par(mfrow=c(2,4))
for( i in 1:8 ){
titleLab <- paste("Cluster ", i, sep="")
expr.prof <- as.matrix(dat[cutObj == i,])
plot(expr.prof[1,],
ylim=range(expr.prof, na.rm=T), type="l",
xlab="Time", ylab="Expression", main=titleLab)
apply(expr.prof, 1, lines)
}
What Do These Clusters Look Like?
The remaining 8 clusters:
par(mfrow=c(2,4))
for( i in 9:16 ){
titleLab <- paste("Cluster ", i, sep="")
expr.prof <- as.matrix(dat[cutObj == i,])
plot(expr.prof[1,],
ylim=range(expr.prof, na.rm=T), type="l",
xlab="Time", ylab="Expression", main=titleLab)
apply(expr.prof, 1, lines)
}
Picking Clusters for TF Motifs
> barplot(table(cutObj), main="Cluster Sizes", xlab="Number
of Genes")
We want to select a cluster with a reasonably large number of genes to look for
upstream TF binding site motifs.
Co-expression  Co-regulation.
Hence we look to the promoter regions to see if we can elucidate common
regular expression patterns.
Statistically over-represented patterns are potential transcription binding sites.
Extracting Promoter Sequences
Promoter sequence retrieval can be performed using RSA:
http://rsat.ulb.ac.be/rsat/genome-scale-dna-pattern_form.cgi
TF Motif Finding Tools
MEME
http://meme.sdsc.edu/meme/meme.html
BioProspector
http://ai.stanford.edu/~xsliu/BioProspector/
Improbizer
http://www.cse.ucsc.edu/~kent/improbizer/improbizer.html
Verbumculus
http://wwwdbl.dei.unipd.it/cgi-bin/verb/family.cgi
OligoAnalysis
http://embnet.cifn.unam.mx/~jvanheld/rsa-tools/oligo-analysis_form.cgi
Mobydick
http://genome.ucsf.edu/mobydick/
TF Motif Finding Tools
MDScan
http://ai.stanford.edu/~xsliu/MDscan/
Weeder
http://159.149.109.16:8080/weederWeb/index2.html
Gibbs Motif Sampler
http://bayesweb.wadsworth.org/gibbs/gibbs.html
AlignACE
http://atlas.med.harvard.edu/cgi-bin/alignace.pl
CONSENSUS
http://bifrost.wustl.edu/consensus/html/Html/interface.html
Making Sequence Logos
WebLogo
http://weblogo.berkeley.edu/logo.cgi