Transcript file
Regulatory Genomics
Lecture 2 November 2012
Yitzhak (Tzachi) Pilpel
1
Course requirements
• Attendance and participation
• Two reading assignments
• A final take home papers reading-based
exam
• website
No meeting next week on Nov 15th
2
Expression regulation of genes determines
complex spatio-temporal patterns
3
Monitor
expression during
cell cycle
mRNA expression level
4
3
2
1
0
-1
-2
0
5
10
15
G1 S G2 M G1 S G2 M
Time
4
Genes can be clustered based on
time-dependent expression profiles
1.2
0.7
0.2
-0.3
1
2
-0.8
-1.3
-1.8
Time -point
3
Normalized
Expression
Normalized
Expression
Time-point 3
Normalized
Expression
Time-point 1
1.5
1
0.5
0
-0.5
1
2
3
-1
-1.5
Time -point
1.5
1
0.5
0
-0.5
1
2
3
-1
-1.5
-2
5
Time -point
The K-means algorithm
• Start with random
positions of centroids.
Iteration = 0
6
K-means
• Start with random
positions of centroids.
• Assign data points to
centroids
Iteration = 1
7
K-means
• Start with random
positions of centroids.
• Assign data points to
centroids.
• Move centroids to
center of assigned
points.
Iteration = 1
8
K-means
• Start with random
positions of centroids.
• Assign data points to
centroids.
• Move centroids to
center of assigned
points.
• Iterate till minimal
cost.
Iteration = 3
9
An expression cluster
1D and 2D clustering of gene
expression data
Hierarchical clustering
How to join sets?
a
b
c
f
e
d
Gene y
How to measure a distance
between expression profiles?
Gene y
Gene x
t2
t4
t3
t5
t1
Gene x
14
Clustering the data
Try these two applets at home
(needs java)
http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletH.html
http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html
The common distance matrices
16
Promoter
Motifs and
expression
profiles
CGGCCCCGCGGA
CTCCTCCCCCCCTTC
TGGCCAATCA
ATGTACGGGTG
17
Formaldehyde crosslinks living yeast cells
= epitope tag
TF
= TF of interest
Inside the yeast nucleus:
TF
Binding site
Binding site
Binding site
Harvest and sonicate; results in DNA fragments
(some of which are bound to proteins)
ChIP - chromatin
immunoprecipitation
Reversal of the crosslinks to separate DNA segments from proteins,
and fluorescence labeling of each pool separately
(unenriched DNA)
(enriched DNA)
hybridization to DNA array of all yeast intergenic sequences
18
P-value, or confidence level, for each
spot in array
The total number of protein-DNA interactions in the location analysis data
set, using a range of P value thresholds:
P-value 0.05
35,365 interactions
A P-value was selected which
minimizes false positives, at the
expense of gaining false negatives.
P-value 0.001
P-value 0.01
12,040 interactions
P-value 0.005
8,190 interactions
P-value 0.001
3,985 interactions
19
Genome-wide Distribution of Transcriptional Regulators
• Promoter regions of 2343 of 6270 yeast genes (37%) were bound by
1 or more of the 106 transcriptional regulators (P=.001)
At P= 0.001, significantly more intergenic regions
bind 4 or more regulators than expected by chance
Avg.: regulator binds 38 promoter
regions
20
Network Motifs
21
Network Motifs
22
Network Motifs in the Yeast Regulatory Network
-Based on algorithmic analyses performed in Matlab; http://jura.wi.mit.edu/cgibin/young_public/navframe.cgi?s=17&f=networkmotif
Protein
Gene
3
10
49
90
81
18
23
The Cell Cycle Transcriptional
Regulatory Network:
Ovals represent regulators,
connected to genes they
regulate
Blue boxes represent sets of
genes bound by a common set
of regulators.
Each box is positioned
according to the time of peak
expression levels for the
genes represented by the box.
Various stages of cell cycle
24
Length of arc defines the period of activity of that regulator
Network of Transcriptional Regulators Binding to
Genes Encoding Other Transcriptional Regulators
25
Network of Transcriptional Regulators Binding to
Genes Encoding Other Transcriptional Regulators
26
Network of Transcriptional Regulators Binding to
Genes Encoding Other Transcriptional Regulators
27
The Central Dogma of Molecular Biology
Expressing the genome
RNA
Inactive
DNA
DNA
mRNA
f
Protein
f
28
Translation consists of initiation,
elongation and termination
STOP 3’
5’
Anti-codon
29
Codon
The ribosome attachment site determines
initiation rate
Yeast
E. coli
30
A consensus for S. cerevisiae ribosome
attachment sites?
100%
0%
position relative to ATG
sequence
How good is it as a
“ribosomal
attachment site” ?
ribosomal
attachment
site score
31
GCG
GCG
GCG
CAG
GCG
CTG
GCG
GCG
GCG
GCG
CGC
3’
5’
32
The sequence adaptation score of proteins in yeast
good score
CRP
ribosomal
attachment
site score
120
100
80
60
40
20
0
bad score
bad
Rank
good
33
Multiple codons for the same amino
acid
C1 C2 C3 C4 C5 C6
Serine:
UCU UCC UCA UCG AGC AGU
Cysteine:
UGU UGC
Methionine: UGG
STOP: UAA,
UAG UGA
34
G T R Y E C Q A S F D
C1C1C1C1C1C1C1C1C1C1C1
C2C2C2C2C2C2C2C2C2C2C2
C1C1C2C1C1C2C1C1C2C1C1
C2C2C2C2C1C1C1C1C1C1C1
C1C1C1C1C1C1C1C2C2C2C2
For a hypothetical protein of 300 amino acids with two-codon each,
There are 2^300 possible nucleotide sequences
These variants will code for the same protein, and are
thus considered “synonymous”.
Indeed evolution would easily exchange between them 35
Selection of codons might affect:
Accuracy
Throughput
RNA-structure
Costs
Folding
36
A simple model for translation efficiency
…
ATC
The tRNA Adaptation
Index (tAI)
CCA
AAA TCG
…
AAT
…
Wobble Interaction
Wi
wi =
ni
(1 s
ij
j 1
)tRNAij
Wi/Wmax
if Wi0
wmean
else
{
1/
tAIg
w
i
k
dos Reis et al. NAR 2004
k1
g
g
37
Supply demand and charging
38
How the RNA structure influences
translation?
?
39
Protein
abundance
No correlation between
CAI and protein
expression
Protein
abundance
Conclusions from synthetic library
Positive correlation
between structure’s
energy and expression
The 5’ window needs to
be un-folded for high
expression
40
Formaldehyde crosslinks living yeast cells
= epitope tag
TF
= TF of interest
Inside the yeast nucleus:
TF
Binding site
Binding site
Binding site
Harvest and sonicate; results in DNA fragments
(some of which are bound to proteins)
ChIP - chromatin
immunoprecipitation
Reversal of the crosslinks to separate DNA segments from proteins,
and fluorescence labeling of each pool separately
(unenriched DNA)
(enriched DNA)
hybridization to DNA array of all yeast intergenic sequences
41
A genome-wide method to measure
translation efficiency
(Ingolia Science 2009)
42
Translational response to
starvation
43
The Central Dogma of Molecular Biology
Expressing the genome
RNA
Inactive
DNA
DNA
mRNA
f
Protein
f
44
Production
mRNA
Option 1 Option 2
abundance
degradation
Option 3
Option 4
45
Relationship between gene expression levels and mRNA decay rates across genes.
A study in human
population examined
decay and steady-state
mRNA level variation
across people.
Found strong negative or
positive correlations
between mRNA level and
decay rates.
Fast responding genes
show “discordant” relation
suggesting that increased
expression is often
accompanied by increased
decay rate
The various phases are coupled
47
At the hardware level (posttranscription: RNA binding proteins)
G1
1
1
1
0
G2
1
0
0
1
G3
0
1
1
1
48
At the hardware level (posttranscription: microRNA)
RISC
RISC
RISC
RISC
G1
1
1
1
0
G2
1
0
0
1
G3
0
1
1
1
49
50
Yang CGFR 16:397, 2005
Computational approaches to
find microRNA genes
• MiRscan (Lim, et al. 2003)
– Scan to find conserved
hairpin structures in both C.
elegans and C. briggsae.
– Using known microRNA
genes (50) as training set.
51
What is the effect of over expression of a miR?
52
None-Coding RNAs are often co- targeted with their
own targets for various cellular needs
53
miR-124 decreases similarly the abundance
and translation of mRNA targets
54
miRs
microRNA expression profiles classify
human cancers
55
Samples (patients)
Lu et al. Nature 435: 834, 2005
Gene expression is noisy
56
Fluorescence distribution shapes
57
The cell intrinsic and extrinsic contributions to
noise
58
The actual intrinsic and extrinsic sources of noise:
Extrinsic – variation in copy numbers of molecules
among cells;
Intrinsic: stochastic events
Extrinsic
Regulation
by transcription
factors
Ribosome
RNA
Polymerase
Φ
Intrinsic
Chromatin
remodeling
Protein degradation
RNA
DNA
Transcription
process
Translation
process
Protein
59
A theoretical approach
DNA
mRNA
Protein
60
The ratio of transcription to translation should
affect noise
61
Transcription bursts should affect noise
62
Can noise
be useful?
The native net shows longer and more
duration-diverse competence periods
Native networks does better on a
wider range of extracellular [DNA]
The trade-off:
High competence allows
finding solutions, but reduces
growth rate
Questions about noise
• What are the sources of noise?
• How is noise regulated in cells
• How is it tolerated by the biological
systems that need to be noise free?
• When is noise advantageous /deleterious/
neutral?
66