Substitution - UMR CNRS 5558 Laboratoire de Biométrie et Biologie

Download Report

Transcript Substitution - UMR CNRS 5558 Laboratoire de Biométrie et Biologie

Variations of neutral substitution
patterns along mammalian
genomes
Julien Meunier, Laurent Duret
Laboratoire de Biométrie et Biologie Evolutive
CNRS - Université Lyon 1
Evolution : mutation, selection, drift
Base modification,
replication error,
deletion, insertion, etc.
DNA repair
Substitution
Individual
Mutation
germline
soma
transmission to the offspring
(polymorphism)
Population (N)
Fixation
no transmission to
the offspring
Loss of the allele
Neutral substitutions
• Substitutions that do not affect the fitness of
the organism (≈ no effect on the phenotype)
• Patterns of neutral substitutions vary along
the genome
– Variations of mutation patterns
– Variations of fixation probability (biased gene
conversion)
Why study neutral substitutions ?
• Understanding the forces responsible for
variations of neutral substitutions along
chromosomes is useful :
– to be able to detect the hallmarks of selection
within genomes (e.g. phylogenetic footprinting)
– to reveal molecular mechanisms involved in
genome functioning (e.g. replication, repair,
transcription, etc.)
Homology-dependent
methylation in primates
repetitive DNA
Meunier et al. PNAS 2005
Transposable elements
• Transposable elements (TEs):
– Ubiquitous in eukaryotes
– 45% of the human genome
• Deleterious effects
– Insertions within coding regions, interference
with regulatory elements
– Chromosomal rearrangements by ectopic
recombination
Mechanisms of defense against TEs
• DNA methylation
– Decrease expression (plants, fungi, animals)
– Suppress recombination (Ascobolus)
– Inactivate TEs by mutation (Neurospora)
• Plants, Neurospora :
– Targeting of methylation specifically to repeated seq.
• Mammals:
– Methylation of cytosines in CpG dinucleotides
– Both TEs and non-repeated sequences (including
exons) are methylated
Methylation as a mechanisms of
defense against TEs in mammals ?
• Specific patterns of methylation in TEs vs. unique
DNA
– Lees-Murdock et al (2003): timing of methylation and
demethylation during mice development
– Yates et al (1999): B1 repetitive elements act as methylation
center signal in mice
• Rabonowicz et al (2003): levels of methylation are
similar in TEs and in exons
– method qualitative rather than quantitative ?
– germline vs. somatic pattern of methylation ?
Methylation as a mechanisms of
defense against TEs in mammals ?
• Is the pattern of methylation different in TEs vs. unique
DNA ?
• Indirect approach: Analysis of patterns of substitutions at
CpG and non-CpG sites in TEs
• Cytosine methylation => increase in the rate of C->T mutations
(x 10)
Cytosines deamination
O
NH2
O
C
C
N
CH
HN
CH
C
CH
C
CH
O
N
H
Cytosine
C:G
deamination
N
H
Uracile
U:G
mismatch
repair
C:G
Methylated cytosines deamination
NH2
O
CH3
C
O
CH3
C
N
CH
HN
CH
C
CH
C
CH
O
N
H
Methylated
cytosine
C:G
deamination
N
H
Thymine
repair
T:G
mismatch
C:G
T:A
Substitution patterns in the
hominidae lineage (1)
• Human, chimp, baboon triple genomic alignments:
– 14.3 Mb (introns and intergenic regions)
– 36 loci from 12 autosomes
• Substitutions inferred by parsimony
A
baboon
A
chimp
A->G
G
human
about 6 Myr
Use parsimony with parsimony
• Parsimony can be erroneous when patterns
of substitutions are biased, even for short
evolutionary distances (EyreWalker 1998)
• CpG dinucleotides in mammals:
– methylation of C
– hot spot of C->T transitions
– C->T ≈ 10 times T->C
TCC
TTC
baboonTTC
chimp TCC
human TTC
p1 ≈ 50 p2
TTC TCC TTC
human chimp baboon
p1 = 5*10-3
TTC TCC TTC
human chimp baboon
p2 = 0.1*10-3
TCG
TTG
baboonTTG
chimp TCG
human TTG
C->T
C->T
T->C
p1 ≈ 1/2 p2
T->C
TTG
TCG TTG
p1 = 5*10-3
C->T
TTG
C->T
TCG TTG
p2 = 12.5*10-3
Substitution patterns in the
hominidae lineage
TTC
TCG
T->C
C->T
TTC TCC TTC
human chimp baboon
TCG TTG TCG
human chimp baboon
non-CpG site
CpG site
• non-CpG sites: no C in 5’, no G in 3’
• CpG sites: sites predicted by parsimony to be a CpG in the
human/chimp last common ancestor
• other sites: ambiguous => ignored
• simulations => correct estimates of substitution rates
Substitution patterns in the
hominidae lineage
• pooling together complementary substitutions, e.g. A->G and
T->C
• Substitution rates: number of X->Y subst. / number of X in
the ancestral sequence
– 4 transversion rates: A->T; C->G; A->C; C->A
– 2 transition rates: A->G; G->A
Dataset
• Human/chimp/baboon triple alignments of
orthologous genomic regions
• Introns and intergenics regions (excluding
CpG islands)
• Total: 14 Mb, 40% TE
• Substitutions in human and chimp lineages
– 54,371 substitution at non-CpG sites (0.6%)
– 8,618 substitution at CpG sites (5.8%)
Substitution rates at non-CpG sites
Substitution rates at CpG sites
Substitution rates in TEs and
non-repetitive DNA
• Non-CpG sites
– Rates of substitution are similar in TEs and unique DNA
• CpG sites :
– Rates of transversion are similar in TEs and unique DNA
– The rate of transition is about 40% higher in TEs than in
unique DNA
• => higher rate of methylation in TEs vs. nonrepetitive DNA
Length and divergence of TEs
ancestral active element
present day
copies
Impact of TE length and divergence on methylation ?
CpG transition rate and TE size
CpG transition rate and TE divergence
Divergence = distance to the
consensus sequence of the TE
subfamily (approx. ancestral
sequence)
CpG transition rate,TE divergence
and size
LINEs + LTR
CpG transition rate,TE
divergence and size
• CpG methylation increases with TE size
• CpG methylation decreases when TE
divergence increases
• => homology-dependent methylation of TEs
CpG transition rate and TE
divergence: the case of SINEs
Alus are protected from methylation by
a sperm binding protein (Chesnokov &
Schmid 1995)
Hypothesis: as the Alu elements
diverge, their ability to bind this protein
decreases => germline methylation
increases
Other repeated sequences ?
• Analysis of CpG substitution rates in
retropseudogenes
– single-copy retropseudogenes
– multi-copy retropseudogenes (n ≥ 10)
• Rate of CpG transition higher in multi-copy than
in single-copy retropseudogenes
• => General process of homology-dependent
methylation of repeated sequences
Mechanism of homology dependent
methylation: RNA interference ?
• Most of TE elements are defective: no
autonomous transcription ...
• ... but many TE elements are inserted within genes
(introns or UTRs), in both orientations:
DNA
gene B
gene A
TE
RNA
siRNA
dsRNA
Mechanism of homology dependent
methylation: RNA interference ?
• DNA methylation induced by short interfering RNAs:
– well established in plants
– recently discovered in mammals (Kawasaki & Taira 2004; Morris et
al. 2004)
• Many TEs are located in transcription units, in both
orientation
• => siRNA induced DNA methylation is a priori expected to
affect TEs !
Part 2
Recombination and the evolution
of GC-content
Isochore organization of
mammalian genomes
• Large scale variations of GC-content along chromosomes
– Affect both coding and non-coding regions (introns, intergenic
regions) (Bernardi et al. 1985 ...)
• Correlations wih other genomic features:
GC-poor
GC-rich
Gene density
low
high
Intron length
long
short
Repeats
LINEs
SINEs
Replication timing
late
early
Recombination rate
low
high
What drives the evolution of GC-content ??
Correlations between recombination rate and
GC content
• Human: cross-over / genomic GC (Kong et al. 2002):
– R = 0.4 (N = 957 seq., 3 Mb)
• Yeast: recombination / GC3 (Birdsell, 2002)
– r = 0.4 (N = 6,143 genes)
• Nematode: cross-over / GC introns (Marais et al. 2001):
– r = 0.3 (N = 10,486 genes)
• Drosophila : cross-over / GC introns (Marais et al. 2001):
– r = 0.2 (N = 7,337 genes)
Correlations are highly significant (p < 0.0001) ...
but (very) weak !
Recombination is a poor predictor of GC content (and vice versa)!
Evolution of GC content and of recombination
rate: different timing scales ?
• GC-content:
– results from the average substitution pattern over long evolutionary
times
• Recombination rate:
– measured in extant populations
– may change rapidly:
• global variations between species (e.g. human vs. mouse)
• variations within populations (and according to sex)
• local variations (e.g. due to translocations)
Evolution of GC content and of recombination
rate: different timing scales ?
• To test the hypothesis that recombination affects the
evolution of GC-content, it is necessary to use estimates of
recombination rate and substitution patterns measured on
similar time scales
• Analysis of the recent pattern of substitution in primates
• Human, chimp, baboon triple genomic alignments:
– 14.3 Mb (introns and intergenic regions)
– 36 loci from 12 autosomes
GC-content expected at
equilibrium (GC*)
• Equilibrium GC-content : the GC content that sequences
would reach if the pattern of substitution remains constant
over time = the future of GC-content
• Inferred from the rates of substitutions observed in
human/chimp lineages
• Arndt et al. (2002): model of DNA sequence evolution
with neighbor-dependent mutation (CpG)
GC-content expected at
equilibrium and recombination
0.48
Equilibrium
GC-content
GC*
N = 33
R2 = 61%
p < 0.0001
0.44
0.40
0.36
0.32
0
0.5
1
1.5
2
2.5
3
3.5
Cross-Over Rate (cM/Mb)
Meunier & Duret Mol. Biol. Evol. 2004
4
GC-content <-> Recombination
• Strong correlation: suggests direct causal
relationship
• GC-rich sequences promote recombination ?
– Gerton et al. (2000), Blat et al. (2002), Petes & Merker (2002)
• Recombination promotes AT->GC substitutions ?
GC-content and recombination
N = 33
R2 = 21%
p = 0.007
0.52
0.48
Present GCcontent
0.44
0.40
0.36
0.32
0
0.5
1
1.5
2
2.5
3
3.5
Cross-Over Rate (cM/Mb)
4
Recombination and GC-content
• Recombination drives the evolution of GC-content
• Molecular mechanims ?
Biased gene conversion (BGC)
Molecular events of meiotic recombination
T
Heteroduplex
DNA
G
(G->A)
Non-crossover
Crossover
(T ->C)
DNA
mismatch
repair
T
C
A
G
If DNA mismatch repair is biased (i.e. probability of repair is not 50% in favor of
each base) => BGC
The dynamics of the fixation process for one locus under BGC is identical to that
under directional selection (Nagylaki 1983)
Biased gene conversion (BGC)
Lamb (1986), Brown & Jiricny (1988), Holmquist (1992), Eyre-Walker (1993), Eyre-Walker
& Hurst (2001), Galtier et al. (2001), Birdsell (2002)
•
Parameters affecting BGC:
–
–
–
–
•
DNA repair bias
recombination rate
length of heteroduplex
effective population size
Biased repair toward GC has been observed experimentally after transfection of
mismatched DNA fragments into cultured cells:
– mammals, xenopus (Brown 1987, Bill 1998) : strong bias (5.5 to 1) (possibly an adaptation
to the hypermutability of cytosines) (Fryxell and Zuckerkandl, 2000)
– yeast: moderate bias (1.48 to 1) (Birdsell, 2002)
•
Analysis of silent polymorphism: GC alleles have a higher probability of fixation
than AT alleles (Eyre-Walker 1999, Galtier et al. 2001)
•
Correlation between recombination and GC-content
Recombination and GC-content
• Recombination drives the evolution of GCcontent
• What drives the evolution of recombination ?
Length of chromosome arms and
recombination rate
Average cross-over rate
(cM/Mb)
2.4
Human:
- 5 acrocentric chrom.
- 18 metacentric chrom.
 41 chromosomal arms
R2=72%
p < 0.0001
2.0
1.6
1.2
0.8
0
40
80
120
Length (Mb)
160
Chromosome length and GCcontent in the chicken genome
GC-content
52%
first draft chicken
genome assembly
(March 1, 2004)
48%
R2 = 81% (excluding
Z and W)
44%
22
40%
Z
W
0.1
1
10
100
Chromosome size (Mb)
1000
What drives the evolution of GCcontent ?
• Recombination
– global factors:
• length of chromosome arms
• position relative to the centromere and telomeres
• ...
– local factors:
• hot spots of recombination
• ...
• Population size
– like selection, the efficacy of biased gene conversion
depends on population size
Acknowledgments
•
•
•
•
Julien Meunier
Dominique Mouchiroud
Adel Khelifi
Vincent Navratil