Transcript Nov3_06
Testing the Neutral Mutation Hypothesis
The neutral theory predicts that
polymorphism within species is correlated positively
with fixed differences between species
i.e.
Genes that exhibit many interspecific differences will
also have high levels of intraspecific polymorphism.
McDonald-Krietman Test
Neutral Prediction:
nonsynonymous fixed
synonymous fixed
=
nonsynonymous polymorphism
synonymous polymorphism
Fixed Differences
Nonsynonymous
Synonymous
% nonsynonymous
21
26
45%
Polymorphisms
2
36
5.3%
G6PDH from D. melanogaster and D. simulans. Eanes et al. 1993
If most nonsynonymous substitutions are adaptive, then
they will increase in frequency and be fixed more rapidly
than neutral alleles.
1.0
advantageous allele
Frequency
neutral allele
0
Time
As a result, they spend less time in a polymorphic state,
therefore contribute less to within species polymorphism.
Another example (N = 6-12 alleles per species for
the coding region).
Fixed Differences
Nonsynonymous
Synonymous
% nonsynonymous
7
17
29%
Polymorphisms
2
42
4.5%
Adh from D. melanogaster, D. simulans, and D. Yakuba
MacDonald and Kreitman 1991
Coalescent
Process
t2
tm is time for coalescence
from m to m-1 sequences
t3
t4
t5
Gene Tree
Coalescent
Process
a
b
The geneology of n sequences
has 2(n-1) branches.
n = number of external
branches.
c
d
n-2 are internal
e
f
g
h
Gene Tree
How long will the coalescence process take?
Simplest case: If pick two random gene copies, probability that
the second is the same as the first is 1 / (2N). This is the probability
that two alleles coalesce in previous generation.
It follows that 1 - 1 / (2N) is the probability that two sequences
were derived from different sequences in the preceding generation.
Therefore, the probability that 2 sequences derived from the same
ancestor 2 generations ago (grandparent) is 1 - 1 / (2N) x 1 / (2N).
It can be shown that the probability that two sequences were
derived from the same ancestor t generations ago is:
[1 - 1 / (2N)t x (1 / (2N)] ~ (1 / (2N e(-t/(2N)
[1 - 1 / (2N)g-1 x (1 / (2N)]
Because N is in denominator, the probability will depend on sample size
Consider probability of common ancestry for:
Generations ago
1
Prob(N=5)
0.400
Prob(N=10)
0.200
2
0.320
0.182
3
0.256
0.162
It can be shown that the average time back to common ancestry
of a pair of genes in a diploid population is 2Ne, and the average
time back to common ancestry of all gene copies is 4Ne generations.
Large pop
Small pop
Coalescence with no mutation
The average degree of relationship increases with time.
All of the gene copies in a
population can be traced back
to a single ancestral gene.
A population will eventually become monomorphic
for one allele or another, with this probability
determined by initial allele frequencies.
Coalescence with mutation
If each lineage experiences m mutations per generation,
then the number of base
pair differences between
them will be #dif = 2mtca.
If the average time to
coalescence is 2N for two
randomly chosen gene
copies, then #dif = 2 m (2N).
Therefore, expect the
average number of base pair differences between gene
copies to be greater in a larger population.
Total length of branches of gene tree
I+L=J
External
Internal
+
branches branches
=
Total time
length
Now consider mutation among branches
during the coalescent process.
i) + e) =
Mutations
internal
branches
+
Mutations
external
branches
=
Total number
of mutations
in gene tree
In theory: total number of mutations equals the
number of segregating sites (K)
Testing for Selective Neutrality
Tajima’ s Test (1989):
D=
Rationale:
P-K/a
V(P - K/a)
Using the difference in
estimates of polymorphism
to detect deviation from
neutrality.
Normalizing factor
P and K are differentially influenced by the
frequency of alleles.
P
K/a
Few alleles at intermediate frequency
>
Many low frequency, variable alleles
<
D = 0 neutral prediction
D > 0 balancing selection
D < 0 directional selection
Fu and Li’s Test (1993):
D=
i - e / (a - 1)
Using the difference in
# mutations in gene tree
to detect deviation from
neutrality.
V[i - e / (a - 1)
Rationale: An equivalent number of mutations is expected
between interior verses exterior branches of a neutral
gene tree.
i
e
Few alleles at intermediate frequency
>
Many low frequency, variable alleles
<
D = 0 neutral prediction
D > 0 balancing selection
D < 0 directional selection
Gene genealogies under no selection, positive
selection, balancing selection, and background selection.
No Selection : 7 neutral
mutations accumulate since
the time of the last common
ancestor.
D=0
Consider the Effects of
Selection on Neutral Sites
Linked to a Selected Site
Positive Selection : neutral
variation at linked sites will
be eliminated (swept away)
as the advantageous allele
quickly is fixed in the
population. This process is
also called hitch-hiking.
D<0
Consider the Effects of
Selection on Neutral Sites
Linked to a Selected Site
Balancing Selection : neutral
variation at linked sites
accumulates during the long
period of time that both
allele lineages are
maintained.
D>0
Consider the Effects of
Selection on Neutral Sites
Linked to a Selected Site
Background Selection : gene
lineages become extinct not
only by chance, but because
of deleterious mutations to
which they are linked, which
eliminates some gene copies.
D<0
Problem: Background selection and hitchhiking are
contrasting processes that lead to the same pattern.
How to differentiate?
Dramatic examples of reduced polymorphism=hitchhiking.
Less dramatic examples=background selection.