LAPTh - CNRS

Download Report

Transcript LAPTh - CNRS

Replication associated strand asymmetries
in mammalian genomes
In silico detection of replication origins
Maxime Huvet
Marie Touchon
Yves d'Aubenton-Carafa
Claude Thermes
Samuel Nicolay
Benjamin Audit
Edward Brodie of Brodie
Alain Arneodo
(CGM, Gif sur Yvette)
(ENS-Lyon)
Supports: CNRS, ACI IMPBio, ANR
« SECOND PARITY RULE »
Long genome sequence fragments tend to show on the same strand: fA = fT and fG = fC
Bacteria/Archaebacteria
Human chromosomes
80
1,4
60
1
A (Mb)
A (Mb)
1,2
0,8
0,6
0,4
40
20
0,2
0
0
0
0,2 0,4 0,6 0,8
1
1,2 1,4
0
20
T (Mb)
40
60
80
60
80
T (Mb)
80
1,4
1,2
60
G (Mb)
G (Mb)
1
0,8
0,6
0,4
40
20
0,2
0
0
0
0,2 0,4
0,6
0,8
C (Mb)
1
1,2
1,4
0
20
40
C (Mb)
LARGE SCALE PROPERTIES OF GENOMIC MUTATIONS
Same mutation/repair processes on the 2 DNA strands
Same values of complementary sustitution rates
A
G
T
C
at equilibrium
Second Parity rule (PR2): fA = fT and fG = fC (at large scales)
(Chargaff, 1962; Sueoka, Lobry, 1995)
What mechanisms cause composition asymmetries ?
REPLICATION : asymmetry of mutation/repair processes between leading and lagging strands
replication
origin
lagging strand
5’
3’
5’
leading strand
3’
EUBACTERIA: G > C and T > A in the leading strand
SGC =
nG – nC
nG + nC
>0
STA =
nT – n A
nT + n A
>0
Composition asymmetry in procaryotes
ORI
TER
TER
Bacillus subtilis
SGC =
nG + nC
SGC
nG – nC
1 kb windows
x 106 pb
5’
lagging strand
G<C
leading strand
G>C
3’
What mechanisms cause composition asymmetries ?
TRANSCRIPTION : asymmetry of mutation/repair processes between transcribed and non-transcribed strands
non-transcribed
strand
RNA POLYMERASE
5’
3’
3’
5’
3’
transcribed strand
5’
EUBACTERIA: G > C and T > A on the non-transcribed strand
SGC =
nG – nC
nG + nC
>0
STA =
nT – n A
nT + n A
>0
Skew profiles associated to transcription and replication in Eubacteria
S = STA + SGC
replicative skew profile
transcriptional skew profile
(-)
ORI
(+)
5’
3’
5’
5’
5’
3’
3’
3’
3’
5’
lagging
strand
3’
leading
strand
5’
S
0
transcribed
strand
0
superposition of replication and transcription
ORI
5’
S
S
5’
0
lagging
strand
leading
strand
3’
non-transcribed
strand
3’
genes (strand +)
genes (strand -)
intergenic regions
S
Bacillus subtilis
Mbp
STRAND ASYMMETRIES IN EUKARYOTES ?
1. Strand asymmetries associated to transcription
in the human genome
Strand asymmetries associated to transcription in human genes
Intergenic sequences
Introns (126 000)
Intergenic sequences
≈ 12 000 genes
(no exons, no repeats)
STA =
nT – n A
STA
nT + n A
8
5’
8
6
6
4
4
2
2
0
0
-2
-20
8
SGC =
nG + n C
SGC
Mean skew
associated to
transcription
∆S = STA + SGC ~ 7%
-2
-40
nG – nC
3’
0
20
40
-40
-20
0
8
5’
6
6
4
4
2
2
0
0
-2
20
40
20
40 (kb)
3’
-2
-40
-20
0
20
Upward jumps (5’)
40
-40
-20
0
Downward jumps (3’)
2. Strand asymmetries associated to replication
in the human genome
Skew profiles around human replication origins
genes (strand +)
genes (strand -)
intergenic regions
Superimposition of replication and transcription biases
ORI
genes (strand +)
genes (strand -)
intergenic regions
S
ORI
5
'
3
'
Transcription : ∆S ~ ± 7%
Replication : ∆S ~ + 14%
S0
Conservation of skew profiles in mammalian genomes
human
mouse
rat
dog
Conservation of replication origins in mammalian genomes
3. In silico detection of replication origins
in the human genome
Detection of upward jumps associated to replication
Main problem :
• necessity to avoid the jumps due only to transcription
Genes
ORI
5
'
S
3'
Mean size : 30 kb
ORI
0
ORI
 100 kb   1 Mb
Scale of analysis :
• larger than typical size of genes
• smaller than typical size of replicons

necessity of multi-scale analysis
Multi scale jump detection using the wavelet transform
S
S
S
numerous jumps
w =100 kb
few jumps
w =200 kb
S derivative
first derivative
high precision
w =50 kb
w =10 kb
w
low precision
Multi scale jump detection using the wavelet transform
position of transitions (1 kb)
Signal smoothened
at large scale (200 kb)
Identification of transitions
Asymmetry of the human genome
Histograms of jump amplitude
upward
downward
%
« Factory roof » skew profiles
x (Mb)
« Factory roofs » around experimentally determined replication origins
TOP1
S
S
MCM4
x (kb)
Conservation of potential origins in mammalian genomes
human
mouse
dog
Model of eucaryotic replicon
Replication terminaison sites : distributed between fixed adjacent origins
O
T
O
O
Eucaryote
at each cycle:
Ori 1
S
Procaryote
after several cycles:
Ori 2
T
Ori 1
Ori 2
after N cycles:
Ori 1
Ori 2
Detection of factory roofs using the wavelet transform
factory roof
wavelets
759 « factory roofs spanning »
~ 40% of the human genome
ASYMMETRY OF HUMAN GENOME
factory roofs
= 40 %
factory roofs
<1%
EUCARYOTIC REPLICON MODEL
transcriptional skew profile
(-)
replicative skew profile
(+)
OR
I
5’
5’
3’
3’
3’
5
’
OR
I
3’
5
’
3
’
5’
transcribed
strand
non-transcribed
strand
3’
0
superposition of transcription and replication
ORI
5’
S
S
5’
0
ORI
3’
5
’
3’
5
’
3
’
Comparison with replication timing data
ori
early
Replication
timing
late
Position on human chromosome 6 (Mbp)
Woodfine et al., Cell Cycle (2005)
GENE ORGANISATION IN HUMAN CHROMOSOMES
Organisation of transcription around predicted replication origins
Co-orientation of transcription and replication
Model of mammalian chromatin organization
Open chromatin
Genomic
DNA
ORI
ORI
S
Replication origins are situated at the center
of open chromatin regions
Conclusions
• Existence of replication-coupled strand asymmetries in human genome
• Replication origins correspond to large transitions of skew profiles
• These transitions are conserved in mammalian genomes
• Detection of more than one thousand putative origins active in germ-line cells
• « Factory roof » profiles : regularly distributed termination sites
• Essential rome of replication in organisation of gene order and expression