poster_CSHL_2007

Download Report

Transcript poster_CSHL_2007

Systematic exploration of cis-regulation using a generic computational framework
Olivier Elemento*, Noam Slonim* (equal contribution) and Saeed Tavazoie
Lewis-Sigler Institute for Integrative Genomics, Princeton University
FIRE
FIRE
What is FIRE ?
FIRE
Yeast: 78 co-expression clusters
From k-mers to motifs
(data from Gasch et al, 2000)
FIRE (for Finding Informative Regulatory Elements) is a
highly sensitive approach for motif discovery from
expression data, based on mutual information. It has the
following characteristics:
Up-regulated
Cy3/Cy5 log-ratios
PAC
Rpn4
Similarity to ChIPchip RAP1 motif
(Lee et al, 2002)
• applicable to any type of expression data,
• obviates assumptions and parameter tuning often
required by existing methods,
Yap1
change
Puf3
PAC
RRPE
• simultaneously finds DNA and RNA motifs and explores
their functional relationships, v) scales well to mammalian
genomes,
• characterizes motif interactions and co-localizations
Down-regulated
Mutual information
• highly sensitive, with very few false positive predictions,
if any,
• highlights the biological role of predicted motifs, their
inter-species conservation, and spatial and orientation
biases,
Yeast: single microarray
Motif conservation
with S. bayanus
Experiment: H2O2 treatment in ΔMsn2/ΔMsn4 background
PUF4
PUF3
MSN2/4
Human
gene
expression
atlas
(clustered)
Human: 78 tissues (Su et al, 2004)
Statistical significance
• displays the results in a user-friendly graphical format.
(data from Su et al, 2004)
RAP1
17 motifs in 5’ upstream regions
6 motifs in 3’UTRs
Maximum of 10,000 expression-shuffled
mutual information values
RPN4
REB1
0 “motifs” when shuffling the gene
labels of the clustering partition
ELK4
73 motifs in 5’ upstream regions
42 motifs in 3’UTRs
MBP1
HAP4
1129 motifs when applying AlignACE
(with default parameters) to each
cluster independently
Sp1
0 “motifs” when shuffling the gene
labels of the clustering partition
miR-525/mR-526c
FIRE uses mutual information to
discover and characterize motifs
Real mutual information value
XBP1
880 “motifs” when applying AlignACE
to the same shuffled clusters as above
bZIP911
NF-Y
Several 3’UTR motifs match the 5’
extremity of microRNAs
BAS1
CBF1
All 23 motifs are highly conserved
with S. bayanus
E2F1
miR-200b/miR-429
Discrete
Cluster
index
5’ upstream region
0
6.45
0
4.39
0
3.50
0
1.98
1.54
1
0.45
1
-1.56
2
-2.32
2
-2.89
Mutual Information
I ( X ;Y )   P( x, y ) log
Y
P ( x, y )
P( x ) P( y )
Co-occurrence
0
0
0
0
1
1
motifs informative
about the phase ?
-0.87
2
Cluster
index
(Data from Bozdech, Llinás, et al, 2003)
-0.08
1
Position bias
P. falciparum: intra-erythrocytic
development cycle
0.01
1
X
SWI4
0.12
1
5’ upstream region
Log-ratio
5’ upstream region
0h
Time
48h
~ 2,700 periodically expressed genes
5’ upstream region
Continuous
-π
Phase
> 50% of our predicted motifs
have a non-random spatial
distribution
2
2
2
2
Stastical significance
Pax2
E2F
CHOP-C/EBPα
TCF11-MafG
The RAP1
binding site
has a position
and orientation
bias
…
+π
-π
Phase
Biological insights
+π
21 motifs in 5’ upstream regions
0 motifs in 3’UTRs
0 “motifs” when shuffling the gene
labels of the phase profile
• Importance of RNA motifs in shaping transcriptomes (~30% of
yeast, worm, human, arabidopsis motifs we found are RNA
motifs)
71% highly conserved with P. yoelli
• In worm/human/mouse, several RNA motifs match miRNA
targets
• “Cooperation” between DNA and RNA motifs
• Avoidance of joint-presence for certain motifs
DNA replication, p<1e-4
plastid, p<0.01
• Under-representation of certain motifs
ribosome, p<0.001
Bozdech, Llinás, et al, 2003
Practical aspects
1
1
TCF11-MafG
PAC and the
Msn2/4 binding
site tend to
avoid being in
the same
promoters
PAC and RRPE
tend be colocalize on the
DNA
Unix command line:
perl fire.pl –expfile=human_clusters.txt –exptype=discrete –species=human