Transcript Document

Measuring genetic change
Level 3 Molecular Evolution and
Bioinformatics
Jim Provan
Page and Holmes: Section 5.2
Types of substitution
A
C
A
A
T
C
C
A
T
C
G
A
C
G
A
C
A
A
A
Single
1 change, 1 difference
Multiple
2 changes, 1 difference
Coincidental
2 changes, 1 difference
C
C
A
C
A
C
T
C
T
A
T
C
A
T
A
A
A
C
C
A
A
A
A
Parallel
2 changes, no difference
Convergent
3 changes, no difference
Back
2 changes, no difference
Types of substitution (continued)
Multiple substitutions can greatly obscure actual
evolutionary history, particularly in cases where there
have been many mutations i.e. over long evolutionary
time scales
Final three examples have serious implications for
inference of evolutionary history:
Similarity inherited from an ancestor is called homology
Independently acquired similarity is called homoplasy
All tree-building methods rely on sufficient levels of
homology
Types of substitution (continued)
A
C
G
T
Substitutions that
exchange a purine for
another purine or a
pyrimidine for another
pyrimidine are called
transitions
Substitutions that
exchange a purine for a
pyrimidine or vice-versa
are called transversions
Measuring evolutionary change
Some sites may undergo
repeated substitutions
As sequences diverge,
measure becomes less
accurate
Saturation occurs most sites changing
have changed before
120
Base pair differences
Simplest measure is to
count number of
different sites
Poor measure:
100
80
60
40
20
0
0
5
10
15
20
Time since divergence (Myr)
25
Correction of observed sequence
differences
Sequence difference
Expected difference
‘Correction’
Observed
difference
Time
A general framework of sequence
evolution models
pAA
Pt =
pAC
pAG
pAT
pCA
pCC
pCG
pCT
pGA
pGC
pGG
pGT
pTA
pTC
pTG
pTT
Pii = 1 -

ji
f = [fA fC fG fT]
pij
The Jukes-Cantor (JC) model
Assumes that all four bases have equal frequencies
and that all substitutions are equally likely
Pt =
-




-




-




-
f = [¼ ¼ ¼ ¼]
Kimura’s 2 parameter model (K2P)
Takes into account different
frequencies of transitions
vs. transversions
Pt =
100
90
80
70
-




-




-




-
Transitions ()
60
50
40
30
Transversions ()
20
f = [¼ ¼ ¼ ¼]
10
0
0
5
10
15
20
25
Felsenstein (1981) (F81)
Takes into account
differences in base
composition
Percentage (G + C) can
range from 25% - 75%
F81 model allows the
frequencies of the four
nucleotides to be different
Does not allow for variation
between genes/species
C G T
Pt =
A
-
A C
G T
-
A C G
T
-
f = [A C G T]
Hasegawa, Kishino and Yano (1985)
(HKY85)
Essentially merges the K2P and F81 models to allow
transitions and transversions to occur at different
rates as well as allowing base frequencies to vary
Pt =
A
C G T
-
A C
G T
-
A C G
T
-
f = [A C G T]
General reversible model (REV)
Most general model - each substitution has its own
probability
Pt =
Aa
Ca Gb
-
Ab Cd
Ac
Ce
Tc
Gd Te
-
Tf
Gf
-
f = [A C G T]
By constraining a-f it is possible to generate all the
other models
Comparing the models
Allow transition/
transversion bias
JC
K2P
A=C=G=T

Allow base
frequencies to vary
HKY85
REV
A=C=G=T
ACGT
ACGT
=

a,b,c,d,e,f
Allow base
frequencies to vary
F81
ACGT
=
Allow transition/
transversion bias
Comparing the models (continued)
A
Observed
C
G
T
A
A
C
C
K2P
G
T
C
G
T
A
C
C
T
G
T
A
C
G
T
G
A
G
C
T
A
JC
A
HKY85
G
T
Assumptions: independence
Assumes that change at
one site has no effect
on other sites
Good example is in RNA
stem-loop structures
Substitution may result in
mismatched bases and
decreased stem stability
Compensatory change
may occur to restore
Watson-Crick base pairing
A
G
A
C C C CU U
GGGG A A
G
C
A
U
G
C C C C U U
C
A
GGG C A A
G
U
A
G
C C CGU U
GGG C A A
G
C
A
U
Assumptions: base composition
Assumption that base
composition is at
equilibrium and that it is
similar across all taxa
studied
In example opposite,
trees inferred using
models which do not
allow for this will not
group Thermus and
Deinococcus
%G+C
Aquifex
64.0
Thermotoga
63.7
Thermus
63.2
Deinococcus
55.5
Others
53.9
All sites are not equally
likely to undergo a
substitution
Functional constraints:
Pseudogenes have lost all
function and can evolve
freely
Fourfold degenerate sites
do not change amino acid
composition of proteins
Non-degenerate sites are
highly constrained
Substitution / site / 109 years
Assumptions: variation in substitution
rate across sites
4
3.5
3
2.5
2
1.5
1
0.5
0
Assumptions: variation in substitution
rate across sites (continued)
0.7
DNA divergence
0.5% / Myr + 20% constraint
A
0.6
0.5
0.4
2% / Myr + 50% constraint
B
0.3
0.2
0.1
0
0
50
100
150
200
250
Divergence time (Myr)
More rapidly evolving sequence shows most divergence
initially but soon saturates
Sequence A actually appears to be more rapidly evolving