Transcript Slide 1

Topic 14. Lecture 20. Positive, negative, and balancing selection in
natural populations.
We considered the five factors of Microevolution - mutation, selection, mode of
reproduction, population structure, and genetic drift - separately. Of course, in nature they
act together, and now we need to understand how this happens. Natural selection is the key
factor of Microevolution, as far as adaptive evolution of phenotypes is considered. Thus, the
presentation is structured around different modes of selection. Still, other factors are also
important.
In particular, we need to understand the relationship between selection - systematic
differences of fitnesses, and genetic drift - random differences of fitnesses. Which one
prevails? There is a simple answer to this question.
Strength of selection, acting on a particular pair of alleles A and a (there could be no
selection on one allele!) is characterized by coefficient of selection, or selective advantage
of A over a s = 1-wa/wA.
Strength of genetic drift is characterized by effective population size Ne, the size of an
equivalent Wright-Fisher population (with some exceptions, Ne is approximately the same
for all loci in the genome).
The key fact:
If, at some locus, Nes >> 1, selection rules. In particular, the most fit allele will be, eventually,
fixed, and will never be lost after this. In contrast, if, at some locus, Nes << 1, random drift
rules. In particular, evolution will be reversible in this case - no allele will be fixed forever.
Also, effective population size determines the level of genetic variation at selectively neutral
loci. At such a locus, virtual heterozygosity H = 4Nem, where m is the mutation rate at this
locus. Thus, knowledge of m makes it possible to estimate Ne for natural populations from
easily observable levels of genetic heterogeneity. Some estimates of Ne in nature are:
humans - 10,000 (not today!)
whales - 35,000
fruit flies - 1,000,000
worms - 100,000 - 1,000,000
marine invertebrates - 1,000,000
ciliates - 10,000,000 (or even more)
bacteria - 10,000,000 (only!)
Effective sizes of most natural populations are much smaller than their actual head counts,
because of high variation in the number of offspring per individual. Thus, the minimal
strengths selection that still matters must be at least ~10-6 - and in some populations only a
much stronger selection can affect evolution.
Let us now consider:
1) Selection that promotes changes (positive)
2) Selection that prevents changes (negative and balancing)
3) Weak or absent selection
1) Selection that promotes changes
a) the complete story of one allele replacement:
Replacements of old, inferior alleles by new, superior alleles, driven by positive selection, in
the most important process in Microevolution, responsible for evolution of adaptations. We
already considered the last phase of this process. However, the whole story consists of 3
phases, the first two of which are affected by stochasticity, due to mutation and to genetic
drift, respectively:
i) a beneficial mutation appears - population has to wait for this to happen.
ii) the mutation survives unavoidable initial genetic drift.
iii) the mutation takes over: d[A]/dt = s[A](1-[A]).
Quantitative analysis of an allele replacement (rough, but fair):
i) a beneficial mutation appears - population has to wait. For how long?
A particular mutation appears once every Nm generations, where N is a number of breeding
individuals. Thus, a typical waiting time may be 100 generations for a simple nucleotide
substitution (if m = 10-8, N = 106), or 100,000 generations for a 3-nucleotide insertion (why 3?),
or forever for a complex event (evolution is mutation-limited to a large extent).
ii) the mutation survives unavoidable initial genetic drift. With what probability?
A mutation will survive with probability ~2s, after which the number of its carriers will be
lifted up to ~1/2s. Thus, if a typical strength of selection for a new, beneficial allele is 0.01,
only ~1/100 alleles are not lost initially. In other words, the waiting time must be multiplied
by 100 (may be more). This initial drift takes ~1/2s generations, after which a mutation is
either lost or out of danger.
iii) the mutation takes over: d[A]/dt = s[A](1-[A]). How fast?
Deterministic propagation of the mutation - after the number of its carriers becomes larger
than ~100 goes fast: it takes ~ 10/s generations for [A] to grow from ~0.0001 to 0.9999.
Although our equation was derived for asexuals, sex even with diploid selection does not
change much, as long as w AA > wAa > waa.
ii) more on the unavoidable initial genetic drift
Drift is a fair process. Thus, after a mutation appears, the expected number of neutral
mutants is always 1 - so that if the mutation was not lost after some generations (with
probability, say, 1%), the expected number of mutants is 100. During a short time when this
branching process matters, selective advantage of the new beneficial allele does not matter
much. If a beneficial mutation is lost, the population has to wait for its next occurrences.
iii) more on how a mutation takes over after becoming frequent
Is dominance/recessivity important for Darwinian evolution? Not much, at least as a rule. If
the beneficial allele is partially recessive, its initial expansion is retarded, and if it is partially
dominant, the final stage of its expansion is retarded. And without dominance (w Aa2 = wAA x
waa, sex changes nothing.
Still, this is not yet a complete story of an adaptive allele replacement. Indeed, a
replacement affects genetic variation at other loci, due to a phenomenon known as hitchhiking. In asexuals, when a unique beneficial mutation reaches fixation, it "accidentally"
drives to fixation all those variants that happened to be in the genotype in which it occurred.
In this respect, sex makes a substantial difference, and limits , due to recombination, the
impact of hitch-hiking to only a relatively small region of the chromosome. A replacement
driven by positive selection produces a region of very low variation, flanked by regions with
some high-frequency derived alleles.
A beneficial mutation (red)
in a population with many
segregating neutral (green)
and slightly deleterious
(blue) variants.
Half-way towards fixation,
the beneficial mutation
carries with it the close-by
variants.
Some of these variants
become detached, due
to crossing-over, by the
time of the fixation.
This process of removal of genetic variation close to the site of an advantageous allele
replacement is called selective sweep. At the boundaries of the region of a sweep, initially
rare variants can reach high frequencies, but not fixation.
The size of the segment of a chromosome that is swept of genetic variation due to an
adaptive allele replacement increases with s, the coefficient of selection in favor of a new
allele and declines Ne and r, the probability of recombination between two nucleotides,
increases. If selection is strong, the allele replacement occurs fast, and a larger segment of
the genome will be swept. If the population is infinite and/or recombination is extremely fast,
the effect of hitch-hiking would disappear.
b) Overlapping selection-driven allele replacements:
Under reasonable strength of selection, a selection-driven allele replacement can take from
~100 to ~10,000 generations. Thus, successive replacements in a population would not
overlap in time, if there is, on average, much less that 1 of them per 100-10,000 generations.
It seems that some populations accumulate beneficial alleles at faster rates. If so, different
selection-driven allele replacements must overlap in time. How can this happen?
Here, the role of sex is critical: in a population of a realistic size, overlapping adaptive allele
replacements can happen only with sex.
Asexual population: beneficial alleles at
different loci that emerged in different
individuals compete with each other. A
replacement sweeps the whole genome!
Sexual population: beneficial alleles at
different loci that emerged in different
individuals can find their way into the
same genotype, due to recombination.
Still, even with sex, overlapping adaptive allele replacements may signify a problem for the population.
Indeed, they imply that fitness landscape changes fast and, if this happens, can the population survive?
When a population follows a rapidly moving fitness peak, it lags behind it substantially - and
without a big lag there would be no overlapping adaptive replacements. The reduction of the
mean population fitness relative to the optimal fitness that corresponds to the top of the
fitness peak is called the lag genetic load: L = (w max-W)/wmax .
Haldane's dilemma: If a population has to follow a fitness peak that moves too rapidly, and,
thus, tries to accumulate too many adaptive allele replacements per unit time, it may suffer
from too high a lag load and go extinct. Remember that, in order to sustain the population
with L = 0.8, individuals of the optimal genotype must produce, on average, at least 5 (or 10
with sex) offspring.
Suppose that, at a given moment, 100 adaptive allele replacements are occurring. Thus, an
average individual lags behind the optimum by 50 alleles. Would such an individual be
viable?
Is Haldane's dilemma real? If positive selection favors new, advantageous alleles
independently, the fitness of an individual with k such alleles, each with advantage s, is
(1+s)k ~ eks. Then, the average individual may have fitness that is way below the fitness of
the optimal genotype and the lag load would be too high if many replacements occur at the
same time. However, epistatic selection can abolish this problem: if selection is soft and
50% of the population is left to reproduce, the per individual number of good alleles
increases by ~1 standard deviation of the number of beneficial alleles per individual - which
can be a lot. We do not know what kind of selection - hard and exponential or soft and
epistatic - is responsible for adaptive evolution. Depending on the answer, the rate of
adaptive evolution is either limited or not limited by the lag load.
c) Selection-driven allele replacements in spatially structured populations
Replacement of an old, inferior allele with the new, beneficial allele can be substantially
slowed down by spatial structure of the population. Propagation of a beneficial allele follows
"traveling wave" dynamics, with the velocity of propagation 2(ms)1/2, where m is the rate of
(localized) migration and s is selective advantage of the new allele. Occasional longdistance leaps of some individuals can speed up this process substantially.
Without epistasis, waves of propagation of different alleles approximately independently
within sexual populations.
Global Spread of Chloroquine-Resistant Strains of Plasmodium falciparum.
Microevolutionary theory obviously needs to take into account spatial structure when the
spread of an advantageous genotype is considered. However, it does not radically alter the
outcome of evolution under strong selection, with the only exception of speciation.
2) Selection that prevents changes
Two forms of selection prevent changes - negative selection and balancing selection.
Selection that prevents changes is much less important, from the point of view of
evolutionary biology, than positive selection. Human's eye and peacock's tail evolved due to
positive selection! Still, selection that prevent changes cannot be ignored completely,
because negative selection is the most common form of selection in natural populations. In
other words, a vast majority of mutations that lead to a substantial change of the phenotype
are deleterious.
One more example of
negative selection:
deleterious mutations in
human rhodopsin.
ADRP: autosomal
dominant retinitis
pigmentosa.
ARRP: autosomal
recessive retinitis
pigmentosa.
CSNB: congenital
stationary night blindness.
Under negative selection, population suffers from genetic load, which can be referred to as
mutation load (abolish mutation, and this load would disappear).
Let us consider the simplest case of one locus with two alleles, A and a, assuming asexual
reproduction or sex with selection in the haploid phase. Fitnesses of alleles A and a are 1
and 1-s, respectively. Deleterious mutations A -> a occur with rate m. Mutation and selection
lead to the following changes in allele frequencies (assuming that a is rare, because s >> m)
mutation
selection
[a] ---------------> [a] + m ---------------> ([a] + m)(1-s) = [a]t+1
At equilibrium:
([a] + m)(1-s) = [a]
[a] + m - s[a] - sm = [a]
Ignoring term -sm (a product of two small numbers), we obtain
[a]eq = m/s.
What is the value of genetic load under mutation-selection equilibrium?
L = 1 - W/wmax; wmax = 1; W = 1[A]eq + (1-s)[a]eq; [a]eq = m/s.
Thus, L = 1 - 1(1-m/s) - (1-s)m/s = m
This remarkable fact, L = m is known as Haldane-Muller principle: mutation load is equal to
mutation rate, and does not depend on the strength of selection against mutations ("one
mutation - one genetic death"). Of course, this is true only if selection removes mutations
one-by-one. What other situations are possible?
1) Recessive mutations at one locus. Consider two alleles at one locus of sexual diploids,
with fitnesses 1 (AA), 1-hs (Aa), and 1-s (aa), where h characterizes dominance of the
deleterious allele a. When a is recessive, mutation load is two times lower: if deleterious
alleles are removed only as homozygotes, each genetic death removes two alleles.
2) Truncation or similar epistatic selection against mutations at many loci. Exponential
selection removes mutations, in a sense, one-by-one, but under truncation one genetic
death can remove many mutations, reducing the mutation load (only with sex).
Both recessivity and truncation are forms of synergistic epistasis between different
deleterious alleles; when present together, mutations reinforce deleterious effects of each
other. How important is this phenomenon in nature remains a matter of debates.
In addition to negative selection, changes of the population can also be prevented by
balancing selection, which, however, keeps the population variable. One form of balancing
selection is the direct dependence of fitnesses of genotypes on allele frequencies, with rare
genotypes having an advantage. Interactions of selection with Mendelian segregation lead
to another curious form of balancing selection, due to advantage of heterozygotes.
Consider a population of sexual diploids with two alleles, A and a, at one locus. Fitnesses of
the 3 possible genotypes, AA, Aa, and aa, are w AA, wAa, and waa, respectively. If w AA < wAa >
waa (advantage of heterozygotes), selection protects variation.
Indeed, due to Hardy-Weinberg law, a rare allele
is mostly exposed to selection in heterozygous
state and, thus, advantage of heterozygotes
leads to a higher fitness of rare alleles.
Frequencies of the 3 possible genotypes, AA,
Aa, and aa are [A]2, 2[A][a], and [a]2. If, for
example, A is rare, [A]2 is small relatively to
2[A][a], so that rare A will be mostly present in
heterozygotes.
A few examples of balancing selection are known, but this mode of selection is rare.
3) Weak or absent selection
The simplest case of strict selective neutrality is particularly important, because there are
many neutral nucleotide sites, at least in large genomes. In this case
1. Equilibrium virtual heterozygosity is H = 4Nem in a diploid population. Derivation is easy,
but we will not consider it.
1. Rate of evolution, the per generation frequency of allele replacements, equals to the
mutation rate m.
The probability of occurrence of a new mutation is mN per generation (say, 0.001). A new
mutation then will be fixed with probability 1/N, because it has the same probability of
eventually taking over the population as any other allele (selection is absent). This gives us
m allele replacements per generation.
If selection is not totally absent (s = 0), it can be regarded as "weak" is Nes < 20. In this case
the superior allele is not fixed permanently. In the simplest case of symmetric mutation, the
rate of evolution and the level of variation are maximal when selection is absent, and decline
very rapidly when selection gets stronger.
Locus A with two alleles A1 and A2, symmetric mutation with rate m, such that 4Nem << 1, so
that most of the time either allele is fixed. Rate of evolution is the frequency of switches
between A1 and A2 fixations.
Detecting natural selection
We reviewed the very basics of the direct theory of Microevolution, which tells us how all its
factors, working together, affect genetic variation within populations. However, this theory is
useful only if we know the actual parameters of factors of Microevolution. This can be
accomplished either by direct measurements, for example of the mutation rate (by parentoffspring comparisons), or through inverse theory of Microevolution, which infers, from
patterns in genetic variation and allele replacement, the parameters of these factors.
We already saw how this works for measuring genetic drift: theory predicts that without
selection H = 4Nem, thus, if we know m and can measure H, we can recover 4Ne (which is
almost impossible to observe directly).
Now we will consider the key issue of measuring natural selection. Indeed, measuring
fitnesses directly is very difficult (it is essentially impossible to measure fitness of a
multicellular organism with an error less than 1-3%), and the results obtained in the
laboratory cannot be applied to wild populations. Thus, indirect methods based on inverse
theory are crucial.
1) Detecting negative selection
This is a relatively easy task - because negative selection is very common. Negative
selection affects evolving sequences in two ways:
1) it reduces the probability of fixation of a mutation with s < 0
2) it reduces the time until elimination of a mutation with s < 0
As a result, negative selection leaves two kinds of footprints:
1) reduced rate of evolution and the level of within-population variation
Reduced relative to what? - to the rate of evolution at selectively neutral sites. According to
the fundamental theorem of neutral evolution, neutral sites evolve at the mutation rate (this
is intuitively obvious). Practically, negative selection is detected by comparing the amount
of interspecies divergence or within-population polymorphism to that at plausibly neutral
sequence sites.
Can we detect negative selection at individual sites or only at sequence segments? This
depends on the depths of the alignment.
Alignment of orthologous regulatory regions of 4 mammals. A transcription factor-binding
site with low divergence is marked by blue. If the alignment includes only a few sequences,
we can only detect substantial segments with reduced divergence rates (never call them
mutation rates!) - for example, using Hidden Markov Model technique.
A typical segment of an alignment of orthologous proteins from different species. Here the
number of sequences makes it possible to detect negative selection even at individual sites.
Data on within-population variation usually allow us only to detect negative selection in
wide classes of sites, for example to show that non-synonymous coding sites are under
stronger selection than synonymous sites. However, with high H making inferences about
individual sites may become possible. We badly need 100 genotypes of Ciona savignyi.
2) An excess of rare alleles
Distribution of allele
(nucleotide) frequencies in
Arabidopsis thaliana. PLoS
Biology 3, 1289-1299, 2005.
At non-synonymous sites an
excess of rare alleles, relative
to the neutral expectation, is
higher. Of course, here we
cannot make inferences about
individual sites.
However, we can make
inferences about the strength
of negative selection because only alleles with
small s are observed as rare
polymorphisms.
In contrast, reduced rate of
evolution tells us very little
about the strength of
selection: s = -0.001 is
enough to stop evolution.
2) Detecting positive selection
This is a difficult and important problem - because positive selection is rare, relatively to
negative selection (this was proposed in 1935 by Ivan Schmalhausen) and because positive
selection is the only driving force of adaptive evolution.
Positive selection affects evolving sequences in two ways:
1) it increases the probability of fixation of a mutation with s > 0
2) it reduces the time until fixation of a mutation with s > 0
Footprint of positive selection looks rather differently depending on its age.
1) Positive selection accomplished a long time ago - interspecies comparisons
In contrast to negative selection, positive selection accelerates evolution (not the rate of
evolution!). Thus, it makes sites or segments to evolve faster than neutrally. As a result, we
can detect positive selection only from comparing relatively close species, such that the
number of accepted substitutions between them per neutral site, Kneu, is ~1-3. Ancient
actions of positive selection, that occurred more than 1/m generations ago (m is the per
nucleotide mutation rate) could never be detected.
So, if we have a large number of close enough sequences, even individual sites where K >
Kneu (Kneu is measured for sites that are probably under no selection) can be detected. This
approach works well for pathogens, with multiple moderately different strains.
Distribution of amino acid replacements along the Neisseria gonorrhoeae transmembrane
porin sequence. Each dot represents one replacement. Obviously, sequence segments
exposed outside the cell evolve much faster, probably due to positive selection. Molecular
Biology and Evolution 17, 423-436, 2000.
Positive selection in HIV-1 protease, detected on samples from 40,000 patients. For each
codon site, the ratio of the rate of the most common allele replacement over the neutral rate
is shown (Journal of Virology 78, 3722-3732, 2004).
However, there are two problems with this approach:
1) Positive selection can act only within one clade, with negative selection acting at the
same site in the rest of the phylogeny. Then, overall K will be low at the site.
2) There may be not enough species to measure K for individual sites. If so, all probably
important sites are treated together, and their average per site number of changes, Kimp, is
calculated. Trouble is, sites under positive are generally scattered between numerous sites
under negative selection, leading to Kimp < Kneu. Only very rarely, there are long enough
segments with a majority of sites under positive selection.
Positive selection acting in one clade,
on a sparse phylogenetic tree.
Sophisticated statistical methods can be used to analyze such data - but, in my opinion,
they reliably detect positive selection only if a substantial fraction of sites to Kimp > Kneu. at
least within a large clade - and this is generally very rare. Most of "important" sites are,
most of the time, under negative, and not positive selection.
A clever idea of MacDonald and Kreitman can offer some help. They realized that the
condition Kimp > Kneu (or Kimp/Kneu > 1) can be relaxed. If negative selection is strong,
"important" sites under it will not be polymorphic in the population. Sites under positive
selection also make only minimal contribution to polymorphism (because polymorphism in
the course of an allele replacement is very short-lived). Thus, instead of asking for
Kimp/Kneu > 1
as a signature of positive selection it is enough to ask for
Kimp/Kneu > Himp/Hneu
Himp/Hneu can be as low as 0.2-0.3 (due to a large fraction of sites under negative selection
among "important" sites), so this is a much less stringent condition.
One problem with this approach is that slightly deleterious variants with -s ~ 1/Ne can
segregate within the population, but are only rarely fixed, and thus inflate Himp/Hneu. A
possible way of dealing with this problem is to ignore rare variants.
Some applications of MacDonald-Kreitman test to Drosophila species suggest that as many
as 50% of allele replacements in fly evolution were driven by positive selection, because
Kimp/Kneu = 2Himp/Hneu
In contrast, in mammals Kimp/Kneu < Himp/Hneu, suggesting no positive selection. The reasons
for such contrast are unclear. Anyway, MK test could never establish identities of individual
sites under positive selection.
2) Positive selection accomplished recently - within-population variation
A recent allele replacement driven by positive selection produces a region of very low
variation, flanked by regions with some high-frequency derived alleles. Such a scar of an
allele replacement is due to an effect called hitch-hiking, and it remains visible for << 1/Ne
generations, where Ne is the effective population size per nucleotide mutation rate.
A beneficial mutation (red)
in a population with many
segregating neutral (green)
and slightly deleterious
(blue) variants.
Half-way towards fixation,
the beneficial mutation
carries with it the close-by
variants.
Some of these variants
become detached, due
to crossing-over, by the
time of the fixation.
There are several definite known cases of recently accomplished selective sweeps.
Reduced levels of genetic variation around the site of recent positive selection-driven allele
replacement (selective sweep) in human populations from Africa (a), Europe (b), and East
Asia (c) (Nature Genetics 39, 218 - 225, 2007).
3) Ongoing positive selection - within-population variation
One must be lucky to study the right population at the right time. Still, there are some
definite cases of ongoing allele replacements driven by strong positive selection. One of
them is parallel acquisition the ability of adults to digest milk (due to persistent expression
of lactase) in Africans and non-Africans. These ongoing sweeps left clear-cut signatures.
(a) Kenyan and Tanzanian C-14010 lactase-persistent (red) and non-persistent G-14010
(blue) homozygosity tracts. (b) European and Asian T-13910 lactase-persistent (green) and
C-13910 non-persistent (orange) homozygosity tracts. Positions are relative to the start
codon of lactase locus (Nature Genetics 39, 31 - 40, 2006).
4) A different approach - detecting positive selection by bursts of substitutions
Suppose that at a codon site fitness landscape was suddenly changed. The new optimal
amino acid may not be reachable from the old one by a single nucleotide substitution. Then,
a clump of two or even three non-synonymous substitutions may follow. Such clumps were
observed in evolution of mammals and HIV-1 (PNAS 103, 19396-19401, 2006).
Clumping of nonsynonymous substitutions is the strongest in conservative regions of
proteins, where the 1:1 situations occur only in ~20% of codons. Indeed - if an important
amino acid is replaced, this must be beneficial. This approach reveals a number of slowlyevolving sites that occasionally undergo positive selection.
Amino acid sites inferred to be under
positive selection in HIV-1 gp120. Left:
rapidly evolving sites previously
inferred to be under positive selection.
Right: conservative sites with strongly
clumped substitutions.
3) Detecting balancing selection
Balancing selection, which requires changing fitness landscapes, favors rare alleles. It
prevents fixations and losses of the alleles involved, leading to durable polymorphisms.
In the extreme case this can lead to
transspecies polymorphisms,
persisting from the time of species
divergence. This is the case for sad csd
(complementary sex determination)
locus in bees. Female must be
heterozygous at this locus, and
homozygotes develop into sterile
males, causing strong selection against
common alleles (Genome Res. 16,
1366-1375, 2006).
Quiz:
Suppose that there is a genome segment with low genetic variation within a population. This
can be due to either negative selection or a recent selective sweep within this segment.
What additional data can be used to distinguish between these two explanations?