Transition bias and substitution models
Download
Report
Transcript Transition bias and substitution models
Transition Bias and Substitution models
Xuhua Xia
[email protected]
http://dambe.bio.uottawa.ca
Transitions and Transversions
A
G
Purine
C
T
Pyrimidine
A
G
C
T
Transition: the substitution of a
purine for a purine or a pyrimidine
for a pyrimidine. Symbolized by s.
Transversion: the substitution of a
purine for a pyrimidine or vice
versa. Symbolized by v.
What is transition bias?
A
G
C
T
Xuhua Xia
Transition bias refers to the degree by
which the s/v ratio deviates from the
expected 1/2. The observed s/v ratio is
almost always much larger than 1/2.
Transition Bias is Ubiquitous. Why?
• For both invertebrate and
vertebrate genes:
sobs
1
vobs
2
• What causes transition bias?
– Mutation bias
– Selection bias
sobs s Ps
vobs v Pv
Selection bias in fixation probability
Protein-coding genes
RNA genes
Mutation bias
Xuhua Xia
Mitochondrial Genetic Code
Codon
Amino
acid
Codon
Amino
acid
Codon
Amino
acid
Codon
Amino
acid
UUU
UUC
UUA
UUG
Phe
Phe
Leu
Leu
UCU
UCC
UCA
UCG
Ser
Ser
Ser
Ser
UAU
UAC
UAA
UAG
Tyr
Tyr
Stop
Stop
UGU
UGC
UGA
UGG
Cys
Cys
Trp
Trp
CUU
CUC
CUA
CUG
Leu
Leu
Leu
Leu
CCU
CCC
CCA
CCG
Pro
Pro
Pro
Pro
CAU
CAC
CAA
CAG
His
His
Gln
Gln
CGU
CGC
CGA
CGG
Arg
Arg
Arg
Arg
AUU
AUC
AUA
AUG
lle
Ile
Met
Met
ACU
ACC
ACA
ACG
Thr
Thr
Thr
Thr
AAU
AAC
AAA
AAG
Asn
Asn
Lys
Lys
AGU
AGC
AGA
AGG
Ser
Ser
Stop
Stop
GUU
GUC
GUA
GUG
Val
Val
Val
Val
GCU
GCC
GCA
GCG
Ala
Ala
Ala
Ala
GAU
GAC
GAA
GAG
Asp
Asp
Glu
Glu
GGU
GGC
GGA
GGG
Gly
Gly
Gly
Gly
Xuhua Xia
• Synonymous and
nonsynonymous
• Degeneracy:
– Non-degenerate
– Two-fold
degenerate
– Four-fold
degenerate
• Transitions are
synonymous and
transversions are
nonsynonymous at
two-fold
degenerate sites.
RNA secondary structure
Seq1: CACGA
|||||
GUGCU
Seq1: CACGA
|||||
GUGCU
Seq2: CAUGA
|||||
GUGCU
Seq2: CGCGA
|||||
GUGCU
G/U pair, although not as strong as A/U or C/G
pair, generally does not disrupt RNA secondary
structure (and occurs frequently in RNA secondary
structure).
Xuhua Xia
Causes of transition bias
sobs s Ps
vobs v Pv
I often say that when you can measure what you are speaking
about, and express it in numbers, you know something about it; but
when you cannot measure it, when you cannot express it in
numbers, your knowledge is of a meagre and unsatisfactory kind; it
may be the beginning of knowledge, but you have scarcely in your
thoughts advanced to the state of Science, whatever the matter may
be."
Lord Kelvin: Phys. Letter A, vol. 1, "Electrical Units of Measurement", 1883-05-03
Xuhua Xia
At Four-fold Degenerate Sites
At four-fold degenerate sites, all
nucleotide substitutions are
synonymous and subject to roughly
the same selection pressure (similar
fixation probabilities)
Glycine
codon:
GGA
GGC
GGG
Gly Asn Lys Gly Asp Lys Ala Ala Pro Ala Cys ...
Fold
4
2
2
2
2
4
4
4
2
S1
GGA AAU AAA GGA GAC AAA GCC GCC CCU GCG UGU ...
S2
GGG AAC AAA GAA GAU AAG GCC GCU CCA GGG UGG ...
s
s
v
Glu
Gly Trp
sobs s Ps s
2
vobs v Pv v
Xuhua Xia
GGT
Four-fold
degenerate site
At Nondegenerate Sites
At nondegenerate sites, all
nucleotide substitutions are
nonsynonymous and subject to
roughly the same selection pressure
(similar fixation probabilities)
S1
S2
Glycine
codon:
GGA
GGC
Gly Asn Lys Gly Asp Lys Ala Ala Pro Ala Cys ...
GGA AAU AAA GGA GAC AAA GCC GCC CCU GCG UGU ...
GGG AAC AAA GAA GAU AAG GCC GCU CCA GGG UGG ...
s
v
Glu
Gly Trp
sobs s Ps s
2
vobs v Pv v
Xuhua Xia
GGG
GGT
nondegenerate site
At Two-fold Degenerate Sites
At two-fold degenerate sites, all
transitional substitutions are
synonymous, and all transversional
substitutions are nonsynonymous
Gly Asn Lys Gly Asp Lys Ala Ala Pro Ala Cys ...
Fold
4
2
2
2
2
4
4
4
2
S1
GGA AAU AAA GGA GAC AAA GCC GCC CCU GCG UGU ...
S2
GGG AAC AAA GAA GAU AAG GCC GCU CCA GGG UGG ...
s
s
s
v
Glu
Gly Trp
sobs s Ps
Ps
2 80
vobs v Pv
Pv
A transition is about 40 time as like to
become fixed as a transversion.
Xuhua Xia
GAA His
GAG His
GAC Gln
GAT
Gln
2-fold
degenerate site
Methylation and deamination
H3CDonor
Xuhua Xia
Methyltransferase
+
Acceptor
H3C-
Methylation and DNA Repair in E. coli
• DNA alphabets: ACGT
• RNA alphabets: ACGU
• DNA duplication and Watson-Crick paring rule:
A-T, C-G
H3C
H3C
H 3C
3’--CTAG----CTAGGTAT----C-----C--CTAG-----------5’
||||
||||||||
?
? ||||
5’--GATC----GATCCATA----U-----T--GATC-----...
3’
H3C
Xuhua Xia
mutH
mutS
mutL
Methylation-Modification System
Bacterial
Genome
Methylase
TGGC*CA
AC*CGGT
Transcription
and Translation
Restriction
enzyme
----TGG|CCA------ACC|GGT---
dsDNA
phage
Xuhua Xia
Bacterial Membrane
Brevibacterium
albidum
CpG-Specific DNA Methylation
• Mammalian DNA methyltransferase 1
(DNMT1)
– NLS-containing domain
– replication foci-directing domain
– ZnD, Zn-binding domain
– polybromo domain
– CatD, the catalytic domain
CpG
mCpG
343
1
NlsD
350
609
RFDD
613
mCpG
748
1110
PBD
ZnD
746
CatD
1124
1620
Fatemi, M., A. Hermann, S. Pradhan and A. Jeltsch, 2001 J Mol Biol 309: 1189-99.
Xuhua Xia
CpG-Specific DNA Methylation
H3C
H3C
5’ATGCGA-------CCGA--------ACGGC--TAA 3’
||||||
||||
|||||
3’TACGCT-------GGCT--------TGCCG--ATT 5’
H3C
Fully methylated Hemi-methylated Unmethylated
Note: 5’CG3’ = CpG
Xuhua Xia
Methylation and Gene Regulation
•
Proteins with a methyl-CpG binding domain (MBD)
– MBD1, MBD2, and MBD3
– MeCP2
•
•
Deacetylases: An enzyme that removes an acetyl group
Histone deacetylases: deacetylate lysyl residues in histones (the half life of an
acetyl group is ~10min). Acetylation removes a positive charge on the lysine amino group and promote nucleosome melting (and gene expression).
Deacetylation tend to decrease or turn off gene expression.
Histone
deacetylase
MBD
---mCpG-----------------
Condensed
DNA with
repressed
transcription
Wade, P. A., and A. P. Wolffe, 2001 Nat Struct Biol 8: 575-7.
Xuhua Xia
Lysine demethylation
Methylation and Mutation
NH
O 2
O
Spontaneous deamination
H3C
H3C
methylation
N
N
N
O
Cytocine is converted to Thymine
Xuhua Xia
N
O
Vertebrate mitochondrion
Parental H
OH
Parental L
Daughter H
OL
Xuhua Xia
Daughter L
Spontaneous deamination
NH 2
NH 2
NH2
O
CH3
N
N
N
N
H
N
H
N
N
Adenine
N
H
NH2
Guanine
N
H
O
Cytosine
O
Methylcytosine
NH3
NH3
NH3
O
H2O
H2O
H2O
H2O
NH3
O
N
N
NH
O
O
CH3
N
N
NH
N
H
N
Hypoxanthine
(Pair with C)
Xuhua Xia
N
NH
H
N
H
N
Xanthine
(Pair with C)
O
N
H
Uracil
(Pair with A)
N
O
N
H
Thymine
(Pair with A)
O
Transversion can erase transitions
Transitions can erase transitions, and transversions can erase transversions.
However, a transversion can erase many transitions occurring before it, and
subsequent transitions cannot erase the transversion:
AACGCTTGACG
AACGCTTAACG
AACGCTTGACG
AACGCTTCACG
AACGCTTGACG
AACGCTTTACG
AACGCTTAACG
AACGCTTGACG
AACGCTTTACG
Although a transition could also erase 2n transversions occurring before it,
this is rare because transversions are in generally much rarer than transitions.
Transitions tend to be missed in counting much more frequently than
transversions.
Xuhua Xia
Summary
• Selection: Transitions are tolerated more than transversion by
natural selection because
– they are more likely synonymous in protein-coding sequences than
transversions
– they are less likely to disrupt RNA secondary structure than
transversions.
• Mutation: Transitional mutation occurs more frequently than
transversions because
– Misincorporation during DNA replication occur more frequently
between two purines or between two pyrimidines than between a
purine and a pyrimidine
– A purine is more likely to mutate chemically to another purine than to
a pyrimidine (e.g., through spontaneous deamination) . The same for
pyrimidine.
• Bias in counting: Transitions tend to be missed in counting
much more frequently than transversions (which necessitates
the substitution models)
Xuhua Xia
Nucleotide Substitutions
convergent
ATACTCAGGTTAAGCT
Observed
sequences
T
C
C
back
ACAATCCGGTTAAGCT
multiple
ACACTCGGATTAGGCT
parallel
single
coincidental
ACACTCGGATTAGGCT
AGACTCGGATTAGGCT
Actual number of changes during the evolution of the two daughter sequences: 12
Observed number of differences between the two daughter sequences: 3.
Correcting for multiple substitutions to to estimate the true number of changes, i.e., 12.
Xuhua Xia
From WHL
Substitution models and phylogenetics
• A substitution model is to model the evolutonary
process so as to correct for multiple hits.
• A phylogenetic reconstruction method implicitly or
explicitly assumes a substitution model.
• A phylogenetic method assuming a wrong
substitution model will typically lead to wrong trees
produced.
Xuhua Xia
A
G
C
T
The diagonal of a transition probability
matrix is subject to the constraint that each
row sums up to 1.
K80
i =0.25
a1 = a6 = a7 = a12 =
a2 = a3 = a4 = a5 = a8 = a9 = a10 = a11=
Unrestricted: no equilibrium i
A
G
C
T
A
a1
a2
a3
G a7
a4
a5
C a8
a9
a6
T a10 a11 a12
A
A
G
C
T
a1A
a2A
a3A
G
C
T
a1G a2C a3T
a4C a5T
a4G
a6T
a5G a6C
GTR
JC69
i = 0.25
ai = c
F81/TN84
A, C, G, T
ai = c
HKY85
A, C, G, T
a1 = a6 = a7 = a12 =
a2 = a3 = a4 = a5 = a8 = a9 = a10 = a11=
TN93
A, C, G, T
a1 = a7 = 1
a6 = a12 = 2
a2 = a3 = a4 = a5 = a8 = a9 = a10 =a11=
The TN93 model as an example
T
A
C
G
T
- frequency parameters
.
1 T
Q
T
T
C
A
1 C
A
A
.
C
C
G
.
G
G
2 G
2 A
.
- rate ratio parameters
In addition to illustrated assumptions, it also assumes that
the frequency and rate ratio parameters do not change over
time, i.e., the substitution process is stationary.
Xuhua Xia
Substitution Models
• There are three types of substitution models in molecular
evolution
– Nucleotide-based
– Amino acid-based
– Codon-based
• Substitution models are characterized by two categories of
parameters: the frequency parameters and the rate ratio
parameters, and different models differ by their assumptions
concerning these two categories of parameters.
Xuhua Xia