Transcript Mar22_24

Classical Population Genetics
Genetic Variation at one Locus with 2 Alleles
Source: Theory of Population Genetics and Evolutionary Ecology,
Jonathan Roughgarden, Prentice Hall, Upper Saddle River, NJ,
1996 reprint of 1979 edition, Part One, pp17-100
Consider a population with two Alleles, A and a.
Possible Genotypes: AA, Aa, and aa
Suppose that we have a population of size, N (usually a
large number). Distribution of Genotypes
NAA = Number of AA Homozygotes
NAa = Number of Heterozygotes
Naa = Number of aa Homozygotes
N = NAA + NAa + Naa
Two important frequencies for us to consider
Genotype Frequencies:
H  Aa
R  aa
Gene Frequences:
2 N AA  N Aa
2 N aa  N Aa
These are important relationships  be sure that you
understand them.
Hardy – Weinberg Law:
If we assume no external forces or processes,
within one generation,
D → p2
H → 2pq
R → q2
and these frequencies remain stable for all future
What assumptions are being made:
1. Individuals of different genotypes do not differ
in fertility.
2. Random union of gametes.
3. All individuals, regardless of genotype, have
an equal likelihood of survival from gamete to
An example to illustrate what is being said by the law:
Suppose an aquarium owner purchases a variety of
fish with two alleles that determine their fin color.
A = red fin
Note: Aa = purple fin
a = blue fin
In the shipment the owner receives, 75% of the fish
have red fins, 25% have blue fins, and none have
purple fins. What will be the eventual distribution of
fin colors in the aquarium? 3
After one generation:
D  p02 
p0 
H  2 p0 q0 
q0 
6 3
16 8
R  q02 
Proof of the Law:
Because of random union of gametes:
prob(AA) = p*p = p2
prob(Aa) = prob(aA) = p*q
or prob(Heterozygote) = 2pq
prob(aa) = q*q = q2
Note: gamete frequencies at start are p and q.*
At this point we use the third assumption that equal ratios of
gametes survive, mate, and the zygotes survive until the adult
stage to produce gametes for the next generation.
Thus, D = p2
H = 2pq
R = q2
Gametes are haploid and previous information about
previous diploids’ population is lost.
What is missing?
1. Natural selection
2. Differential fertility and/or survival
3. Mutation
4. Immigration from other populations
5. Genetic drift
Are any assumptions unnecessary?
1. Random mating also produces the same
results. Just slightly more complex to show
than the random union case.
2. The requirement of distinct generations is not
necessary. However, this assumption makes
the algebra easier.
3. If there is a different distribution of genotypes
among the sexes, the stable position does not
emerge for two generations (assuming that all
other assumptions hold – in particular the
survival one)
Enter Natural Selection:
Survival Rates:
lAA , lAa , Iaa
Fertility Rates:
mAA, mAa, maa
WAA = lAA*mAA WAa = lAa*mAa Waa =Iaa*maa
Go Back to slides 2 and 3 and we can derive the
number of gametes in the population at time, t + 1:
# from AA adults= 2*WAA*pt2 * Nt
# from Aa adults = 2*WAa*2*pt*qt*Nt
# from aa adults = 2*Waa* qt2*Nt
The total population size at time, t+1, is one half the
sum of these three quantities.
Nt+1 = (WAA*pt2 + WAa*2*pt*qt + WAA* qt2)*Nt
An equation such as this is called a difference equation.
This is an example of a “fast”
evolutionary change (< 40 years).
It was caused by industrial
pollution in the area of
Birmingham, England. Before
pollution these moths had
majority coloration (light) that
was difficult to see against the
lichen of trees growing in the
area. After pollution the bark
became black and the lichen
died. This meant that the light
colored insects became easy
prey. So “selection pressure”
favored the dark colored moths.
The difference equation for the population size leads to
these two absolutely essential difference equations for
the gene frequencies:
pt 1 
( ptWAA  qtWAa ) pt
pt2WAA  2 pt qtWAa  qt2Waa
qt 1  1  pt 1
So what?
These equations coupled with the difference equation
for the population size allow us to assign different
fertility and survival rates to the existing three
genotypes and model how the gene pool and
population size change as a result.
Question: Is this absolutely the way things will turn
One last notational adjustment to make matters a little
more simple.
We will work to eliminate the preponderance of W’s
from the equation by multiplying them by a suitable
constant. We “normalize” by selecting one of the W’s
to be 1. Say WAA=1. Then we must divide the
remaining two W’s by WAA. Thus,
wAA = 1 (=WAA/WAA)
wAa = WAa/WAA
waa = Waa/WAA
Note that we denoted these normalized values with a
small, italicized w.
And, FINALLY, we define the selectivity coefficients:
sAA = 1 – wAA
sAa = 1 – wAa
saa = 1 – waa
Notice that, in general, these are selectivity against. That means
that a value of 0 is good and positive decreases the gene pool.
mAA = 100
mAa = 50
lAA = ¾
lAa = ½
maa = 25
laa = 1/5
WAA = (100)(3/4) = 75
WAa = (50)(1/2) = 25
Waa = (25)(1/5) = 5
wAA = 75/75 = 1
wAa = 25/75 = 1/3
waa = 5/75 = 1/15
sAA = 0
sAa = 2/3
saa = 14/15
With all of these substitutions we finally have an
expression for pt+1 that is “manageable”.
pt 1 
pt ( pt  qt wAa )
pt2  2 pt qt wAa  qt2 waa
pt 1 
pt ( s Aa qt  1)
qt (2 s Aa pt  saa qt )
or :
The simulations that follow all used the first form of the
difference equation.
We will consider:
1. Selection against a dominant allele
2. Selection against a recessive allele
3. Heterozygote superiority
Writing a program to implement this model is a quite straight forward
process. This program is written in a functional programming
language used in the Derive® Computer Algebra System.
p·(p·wdd + q·wdr)
dp(p, q, wdd, wdr, wrr) ≔
p ·wdd + 2·p·q·wdr + q ·wrr
HWApprox(p, wdd, wdr, wrr, n, q, i, pp, pn, qp, qn, hw) ≔
q ≔ 1 - p
i ≔ 0
pp ≔ p
qp ≔ q
pn ≔ p
qn ≔ q
hw ≔ []
If i > n
hw ≔ APPEND(hw, [[i, pn]])
pn ≔ dp(pp, qp, wdd, wdr, wrr)
qn ≔ 1 - pn
pp ≔ pn
qp ≔ qn
i ≔ i + 1
Selection Against the Dominant Allele
p0 = .9
wAA = .8
wAa = .8
waa = 1
Note that even though the recessive allele made up only 10% of
the gene pool, in approximately 70 generations it makes up the
entire gene pool.
Selection Against the Recessive Allele
p0 = .1 wAA = 1 wAa = 1 waa = .8
The end result is expected, but there is a qualitative difference. In
the former case the decline of the majority gene started slowly and
then accelerated. Here the initial decline is rapid and then the rate
slows down.
Selection in Favor of Heterozygote
Selection against the recessive is four times that against dominant
p0 = .9; .5
wAA = .9
wAa = 1
waa = .6
Note that in each of the cases (in fact, all cases except p0 = 0 or 1)
The dominant allele will eventually make up 80% of the gene pool
and the recessive will make up 20%. This result is called a stable
Finally a Highly Unusual Result
Selection against the Heterozygote
p0 = .55; .5; .45
wAA = 1
wAa = .8
waa = 1
Notice that if both populations start out with 50% of the gene pool
then that percentage will persist. However, if the percentage
wanders off of 50%, the majority gene will become the entire gene
pool and the other will become extinct. Thus 50% is called an
unstable equilibrium.
Of the four scenarios that we considered, three
resulted in the elimination of one of the Alleles.
Only the case of selection in favor of the
Heterozygote resulted in a mixed gene pool.
Thus, in the presence of natural selection (We will
see later in this lecture what a powerful force this
can be.), this is the only case where genetic
variation is maintained. Polymorphism
Other cases fix on one or the other of the alleles.
Selection in Favor of Heterozygote
Selection against the recessive is four times that against dominant
p0 = .9; .5
wAA = .9
wAa = 1
waa = .6
Note that in each of the cases (in fact, all cases except p0 = 0 or 1)
The dominant allele will eventually make up 80% of the gene pool
and the recessive will make up 20%. This result is called a stable
equilibrium. Can we determine what this equilibrium will be?
More notation (Mathematicians love it!!)
pˆ  equilibriu m value for frequency of A alleles
What do we mean by equilibrium?
When equilibrium is achieved then the frequency of the
alleles stays stable.
pt+1 = pt for allp̂ t > some t0
And of course,
qt+1 = 1 – pt+1 = 1 – pt = qt
On the previous slide this happens around generation
50. So, t0 ≈ 50.
Let’s see if we can predict p̂ . Recall, pˆ  0,1.
We start with the definition of equilibrium:
pt+1 = pt
Earlier we saw that in the presence of natural selection,
pt 1 
( pt wAA  qt wAa ) pt
pt2 wAA  2 pt qt wAa  qt2
Since pt ≠ 0, this means that
pt wAA  qt wAa  pt2 wAA  2 pt qt wAa  qt2
For all t > t0 . Or at the equilibrium value, p̂
pˆ wAA  (1  pˆ ) wAa  pˆ wAA  2 pˆ (1  pˆ ) wAa  (1  pˆ ) 2
Some simple, but messy, algebra gives us the
following result.
wAa  waa
pˆ 
( wAa  wAA )  ( wAa  waa )
pˆ 
s AA  saa
In our example:
wAA = .9
wAa = 1
waa = .6
1  .6
pˆ 
  .8
(1  .9)  (1  .6) .1  .4 .5
qˆ  1  pˆ  1  .8  .2
Experimental evidence:
ST and CH are names of blocks of genes in Drosophilia pseudoobscura because of a chromosomal feature called inversion the genes
in each block are held together and function as two alleles at a single
Solid line simulated path for
Dashed lines are 95%
confidence limits
Vertical bars: experimental
Results correctly predicted the
equilibrium and the dynamics of
the approach to equilibrium.
But, what about mutation?
Ordinarily it works this way
We are going to “stack the deck” in favor of mutation and
i.e. we assume: v = 0
In the absence of any selection our difference equation
pt+1 = (1 – u) pt
This is just the difference equation for exponential decay
Look at the time axis! This process is much slower than our
simulations of natural selection that was anywhere from 1 generation
(pure Hardy-Weinberg) to about 15,000 generations to drop from p=.9
to p=.1.
To actually calculate the predicted time to move from p0 to pt . Begin
p1  (1  u ) p0
p2  (1  u ) p1  (1  u ) 2 p0
p3  (1  u ) p2  (1  u )3 p0
pt  (1  u ) pt 1  (1  u )t p0
Rearrange bottom line as:
 (1  u )t
Take log of both sides and solve for t. This yields,
)  t log( 1  u )
log( 1  u )
Let’s calculate the time to move from p0 = .9 to pt = .1 for
the first curve on the graph shown two slides previously,
i.e. u = 10-5 =.00001
log( .1 .9)
log(. 11111)
 219,721 generation s
log( 1  10 ) log(. 99999)
Mathematical note: Since this quantity involves the quotient
of two logarithms, any base logarithms will give the same
numerical result. i.e. We can use either the log10 or ln
button on our calculator or even log2 if we care to do this.
Extra Credit Project: Use a spreadsheet or write a
computer program to generate the graphs that were shown
two slides previously.
In general, mutation has little effect if selection is at work.
If selection is virtually neutral, say s < .001, then mutation
can have an effect, but it is slow.
However, recurrent mutation can not be totally
• Recurrent mutation tends to maintain a supply of
genetic variation for mutation to act upon
• Even if selection is tending to eliminate one allele,
recurrent mutation tends to maintain its presence in
the gene pool. Thus, if the environment changes to a
situation that is more favorable to the allele that was
being selected against, that allele is still available.
•Mutation is the ultimate source of genetic variation.
Sometimes mutation may oppose selection.
Suppose selection is against A
wAA = 1 – s
wAa = 1 – s (A is dominant) waa = 1
However, also assume v > 0, i.e. There is recurrent mutation of a
to A at a rate, v. Then, it can be shown
pˆ 
On the other hand if A is recessive,
wAA = 1 – s
We have,
pˆ 
If A is recessive, mutation
maintains a much higher
frequency than if it is
wAa = 1 waa = 1
Genetic Drift
So far every model we have considered has been a deterministic
model, i.e. everything is set in motion on a predetermined path.
Chance has been ignored.
But, chance does play a role!
In the sea urchin model, gametes can wash out to sea.
Some types of individual may produce more offspring than
Survival rates may vary
A theory involving chance is called a stochastic theory.
Instead of getting a single number, we get a distribution between
several states
Two sources for chance occurrences
1. Changing environment
2. Internal to the population – they would occur even in a fixed
“Genetic Drift” refers to all chance events internal to the population
Suppose we start with a large population and p = ½ .
From the gamete pool draw 4 individuals (small sample)
Could be 2 & 2 relative to the alleles
Could also be 3 & 1, or 1 & 3, or 0 & 4, or 4 & 0.
Suppose 3 & 1 is the distribution in our sample, then p has moved
from ½ to ¾ without any selective pressures. This is called “sampling
NOTE: Sampling error is more likely to occur as the population size
Experimental evidence of Genetic Drift
Kerr and Wright (1954) sampled a population of Drosophilia
melanogaster heterozygotes. They constructed 96 groups of 4
males and 4 females. At each generation they randomly extracted 4
males and 4 females from that generation, etc. The following is their
Note the “U” shape of the later histograms of the frequency
distributions. This is characteristic of this type of situation