statgen4_old

Download Report

Transcript statgen4_old

Genetic Drift
 In small, reproductively isolated populations,
special circumstances exist that can produce rapid
changes in gene frequencies totally independent of
mutation, recombination, and natural
selection. These changes are due solely to chance
factors. The smaller the population, the more
susceptible it is to such random changes. This
phenomenon is known as genetic drift.
Neutral alleles
 Controversial when first proposed (Kimura 1968)
 Incontrovertible in 2001:




DNA sequence polymorphisms are abundant
In Eukarya, most of the genome is noncoding
most sequence polymorphism lies in noncoding regions
Most sequence polymorphisms appear selectively neutral
• Not useful for studies of genetic adaptation
• Ideal for detection of population substructure and
phylogenetic relationships
Genetic Drift
 For example, when women and their mates are both heterozygous (Aa)
for a trait, we would expect that 1/4 of their children will be
homozygous recessive (aa). By chance, however, a particular couple
might not have any children with this genotype (as shown below in the
Punnett square on the right).

 Unless other families have an unpredictably large number of
homozygous (aa) children for this trait, the population's gene pool
frequencies will change in the direction of having fewer recessive alleles.
Genetic Drift
The net effect of genetic drift on a small population's gene pool can be
rapid evolution, as illustrated in the hypothetical inheritance patterns
shown below. Note that the red trait dramatically increases from
generation to generation. It is important to remember that this can
occur independent of natural selection or any other evolutionary
mechanism.
Inbreeding in finite populations
 Assume a population size of N, therefore 2N alleles
in population. Imagine eggs and sperm released
randomly into environment (e.g. sea)
 What is the probability of 2 gametes drawn
randomly having the same allele?
2N alleles
gen 0
gen 1
probability = 1/(2N)
Inbreeding in finite populations
 Therefore, after 1
generation the level of
inbreeding is F1 = 1/2N
 After t generations the
probability is
1  1 
Ft 
 1
Ft1
2N  2N 
Why?
gen t-1
gen t
1/(2N)
1 - 1/(2N)
1  1 
Ft 
 1
Ft1
2N  2N 
Probability of
picking 2nd allele
More generally
 1 t
Ft  1 1

 2N 
Genetic
Drift
Hartl and Clark p.286
Ft = Probability that any 2 alleles drawn randomly from the
population are identical by descent
Genetic drift and
heterozygosity
 Genetic drift will to a gradual loss of genetic
diversity
 Follow an individual locus and gene
frequency will drift until one allele becomes
fixed
Genetic drift Mathematical Model
2N
?
Genetic diversity: HT = (1 – 1/2N) * HT-1
Probability of identity: FT = 1/2N + (1 - 1/2N) * FT-1
Average time to fixation: 4N
Several populations
 Genetic drift will make initially
identical population different
 Eventually, each population will
be fixed for a different allele
 If there are very many populations,
the proportion of populations fixed
for each allele will correspond to
the initial frequency of the allele
 Small populations will get
different more rapidly
Importance of genetic drift
 Two causes for allele substitutions:
 Selection -> adaptive evolution
 Genetic drift ->non-adaptive evolution
 Most populations are geographically structured
 All populations are finite in size
 All genetic variation is subject to genetic drift but
not necessarely to selection
 Genetic drift as a null hypothesis against which
evidence for selection has to be tested
Effective population size
 Effective population size < Census size (in most cases)
 The effective population size is the size of an ”ideal” population
having the genetic properties of the studied population
 The effective population size is determined by

Large variation in the number of offspring

Overlapping generation

Fluctuations in population size

Unequal numbers of males and females contributing to
4NfNm
reproduction
Ne =
1
1
= n
Ne
Σ
1
Ni
(Harmonic mean)
(Nf + Nm)
Effective population size
 What is an ideal population like?
 (Remember - each parent a has an equal and independent chance of
being the parent of each descendent allele.)
 This is approximated by a Poisson distribution of reproductive success.
 (Reproductive success = # of offspring per parent, or per parental
allele.)
Effective population size
Effective population size
Effective population size
 For diploids:
 where V is the variance in
reproductive success among
diploid individuals. (Note that
the variance among individuals
in reproductive success with a
Poisson distribution is 2 in a
steady-state population, so that
Ne = N-1/2.)Note also that Ne
can also be as great as 2N-1, if
there is no variance in
reproductive success. This fact is
often used in animal and plant
breeding to slow the loss of
genetic material.
4N  2
Ne 
V 2
Effective population size
 Population size in natural
populations does not remain
constant
 Ne with population size
fluctuations is approximately
the harmonic mean of N over
time:
 The harmonic mean is very
sensitive to small values.
 (Ne << ) if N is variable
4NfNm
Ne =
(Nf + Nm)
Backwards: the coalescent
approach
Simplification: 0, 1 or 2 offspring
Coalesce: have the same parent
Probability to coalesce: 1/N
Probability Not to coalesce: 1 – 1/N
t generations:
(1-1/N)t
Average time to coalesce for 2 genes: N
For the whole population: 2N
Founder effects
 Another important small population effect is
the founder effect or founder principle. This
occurs when a small amount of people have
many descendants surviving after a number
of generations. The result for a population is
often high frequencies of specific genetic
traits inherited from the few common
ancestors who first had them.
A new population emerges from a
relatively small group of people.
# of
founders
4,000
500
# of
generations
12
80-100
Current size
2,500,000
5,000,000
Hutterites
Japan
Iceland
80
1,000
25,000
14
80-100
40
36,000
120,000,000
300,000
Newfoundland
25,000
16
500,000
2,500
500
12-16
400
6,000,000
1,660,000
Population
Costa Rica
Finland
Quebec
Sardinia
Founder effect example
 In the Lake Maracaibo region of northwest
Venezuela, for instance, there is an extremely high
frequency of a severe genetically inherited
degenerative nerve disorder known as Huntington's
disease. Approximately 150 people in the area
during the 1990's had this fatal condition and more
than 1,000 others were at high risk for developing
it.
 All of the Lake Maracaibo region Huntington's
victims trace their ancestry to one woman who
moved into the area a little over a century ago. She
had an unusually large number of descendents and
was therefore the "founder" of this population with
its unpleasant genetically inherited trait.
Founder effect example
It is also possible to find the results of the
founder effect even though the original
ancestors are unknown. For example,
South and Central American Indians were
nearly 100% type O for the ABO blood
system. Since nothing in nature seems to
strongly select for or against this trait, it is
likely that most of these people are
descended of a small band of closely
related "founders" who also shared this
blood type. They migrated into the region
from the north, mostly by the end of the
last Ice Age.
Bottleneck
In some species, there have been periods of
dramatic ecological crisis caused by changes
in natural selection, during which most
individuals died without passing on their
genes. The few survivors of these
evolutionary "bottlenecks" then were
reproductively very successful, resulting in
large populations in subsequent
generations. The consequence of this
bottleneck effect is the dramatic reduction in
genetic diversity of a species since most
variability is lost at the time of the bottleneck.
Migration
 A cline is a gradual change in allele frequency along a
geographic gradient
 Ecotypes are genetically distinct forms that are consistently
found in certain habitats.
 Changes in allele frequency can be mapped across geographical
or linguistic regions.
 Allele frequency differences between current populations can
be correlated to certain historical events.
 Contrary to selection and genetic drift gene flow homogenizes
allele frequencies
 Genetic diversity is restored if immigrants carry new alleles or
alleles which are rare in the population
Patterns of geographic variation
 Sympatric, parapatric and allopatric variants
 Subspecies are recognizable geographic variants within a
species (usually subject to discussions)
 A hybrid zone is a region where genetically different
parapatric species or population interbreed.
 Character displacement: sympatric populations of two
species differ more than allopatric populations
Allele distributions can reflect
historical events

Creutzfeldt-Jakob disease (CJD) is caused by a
mutation in the prion protein
 70% of families with CJD share the same allele
 Families from Libya, Tunisia, Italy,Chile and Spain
share a common haplotype.
 These populations were expelled from Spain in the
Middle Ages.
Genetic drift and mutation
no mutation
1  1 
Ft 
 1
Ft1
2N  2N 
gen t-1
gen t

1/2N
1 - 1/2N
 1 
1
2
2
Ft 
1 m  1 1 m Ft1
 2N 
2N
Probability of
neither of 2
alleles being
mutated is (1-m)2
Equilibrium Between Mutation
and Drift
 Run the recurrence equation over and over
and eventually it will settle down to an
equilibrium
Probability of
gen t-1
gen t
1/2N
1 - 1/2N
picking 2nd allele
and it not being
mutated
 1 
1
2
2
Ft 
1 m  1 1 m Ft1
 2N 
2N
1
ˆ
F  Ft  Ft1 
1 4Nm
Migration and drift
 Migrants will also act to counter genetic drift
within a population
 Introduce new alleles into population
Immigration rate
(probability of allele
drawn at random being
from a migrant)
m
Fˆ 
1
1 4Nm
N indvs.
Gene flow
2N
m
?
Probability of identity: FT = [1/2N + (1 - 1/2N) * FT-1] * (1 – m)2
(assumes that the immigrants are different from each other)
N = 100
m=0
0.9
0.9
0.8
0.8
0.7
0.7
N = 100
0.6
0.5
0.4
m = 0.01
0.3
0.2
0.6
0.5
0.4
0.3
0.2
0.1
0.1
0
0
0
50
100
150
200
0
50
100
150
200
(Mutation has the same effect)
2N
μ
?
Probability of identity: FT = [1/2N + (1 - 1/2N) * FT-1] * (1 – μ)2
Assumes that each mutation creates a new allele: Infinite allele model
Mutation also retards the loss of genetic variability due to genetic drift
Equilibrium
 After a long time (the longer the larger the
population) there will be an equilibrium
between genetic drift, gene flow and
mutation
FT = [1/2N + (1 - 1/2N) * FT-1] * (1 – m- μ)2
 F and H will not change any more (if
everything remains constant !!)
F 
1
4N μ + 1
Mutation – drift equilibrium
F 
1
4N m + 1
Migration – drift equilibrium
Migration Models – Island Model One Way Migration
m is the probability that a randomly
chosen allele is a migrant
Change of allele frequency with one way migration
•A is fixed on the island
•a arrives from the mainland at rate m =0.01
p  P  A  in generation 1 on the island
p  P  A  in generation 2 on the island
p  1  m  p  mp *
p  p*  1  m  p  p *
pt  p *  1  m   p0  p *
t
Mutation and Migration
m = 10-4
pt  p0 1  m 
m = 10-2
t
pt  p *  1  m   p0  p *
t
0.484  0.507  1  m   0.474  0.507 
10
Estimation of m
10 generations
Present allele frequency in “island” = pt =
0.484
Allele frequency in mainland p* = 0.507
Initial allele frequency in island = p0 = 0.474
m = proportion of migrant alleles each
generation
0.484  0.507  1  m   0.474  0.507 
10
0.484  0.507
10
 1  m 
0.474  0.507
0.484  0.507
1  m  10
 0.964
0.474  0.507
m  0.035
Estimation of M
10 generations
Present allele frequency in “island” = pt = 0.484
Allele frequency in mainland p* = 0.507
Initial allele frequency in island = p0 = 0.474
m = proportion of migrant alleles each generation
0.484  0.507  1  m   0.474  0.507 
10
0.484  0.507  1  m   0.474  0.507 
10
0.484  0.507
10
 1  m 
0.474  0.507
0.484  0.507
1  m  10
 0.964
0.474  0.507
m  0.035
Island Model of Migration
Many large subpopulations
Average allele frequency
= frequency in migrants
The change frequency in the subpopulations:
pt  p  1  m   p0 p 
t
p10  0.5  1  0.1  0.2  0.5  0.395
10
p
p
p = 0.2
p10  0.5  1  0.1  0.8  0.5  0.605
p = 0.8
10
p
 0.2  0.8
2
m = 0.10
Change in allele frequency over
time
Five subpopulations
1, 0.75, 0.5, 0.25, 0
Change in Allele Frequency Over Time
The Effect on the Fixation Index
Fˆ 
1
1  4 Nm
•total number of alleles = 2N
•m = proportion of alleles replaced by alleles from migrants
•2Nm = number of migrant alleles in any generation
Assumptions:
•N is large
•m is small
m/ N
1
Effect of Migration on Genetic
Divergence
Extreme case Nm =0 F hat =1
•Nm = 0.25
•Nm = 0.5
•Nm = 1
•Nm = 2
Fˆ 
1
1  4 Nm
one migrant every fourth generation: F hat = 0.50
one migrant every second generation: F hat = 0.33
one migrant every generation: F hat = 0.20
two migrants every generation: F hat = 0.11
The equilibrium frequency for the fixation index decreases as the number of migrants increases
Quantitative Genetic Variation
 The loci involved in quantitative traits are often unknown,
and their slight effect on the phenotype does not allow to
count alleles in order to determine variation
 Quantitative genetic variation is measured statistically: one
estimates the variance of the trait in a population
 VP = VG + VE (+VGxE + cov(G,E))
 The variance has the advantage to be addititve: the variance
attributable to several factors can be added to obtain the total
variance.
Heritability
 Proportion of genetic variance in the phenotypic
variance
h2 = VG /(VG + VE )
 Heritability can be measured from correlations
between parents and offspring or studies of relatives
 Very different degree of heritability in different traits
 Artificial selection can be carried out to see if there is
a genetic basis for the observed variation
 Heritability can also be estimated from the rapidity
of change in a trait under artificial selection
Genetic distance
 Genetic variation among populations can be
measured as genetic distance
 Example: Nei’s genetic distance D



Probability of identity of two genes: J=pi2 J1, J2, J12
J12
Index of genetic identity:
I
Genetic distance: D = -ln(I)
J1 J 2
 There are many other estimators for genetic
diversity, some defined particularly for certain types
of genetic markers
Collared lemmings on small
islands (1)
Microsatellite data (4 loci):
Average He = 0.83
Kent region FST = 0.047
(Ehrich et al. 2001)
Collared lemmings on small islands
(2)
 In isolated small populations, variation is lost
by genetic drift
E(H) 
4Nμ
4Nμ + 1
at equilibrium
Observed H = 0.83
4Nμ  4.9
Average μ for microsatellites:
5*10-4
N = 2450
(10 x) 5*10-3
N = 245
The island populations are probably not
isolated. Population size (or μ) would have to be
much larger than observed in order to maintain
the observed level of genetic diversity.
Collared lemmings on small islands
(3)
Isolated small
FT = [1/2N + (1 - 1/2N) * FT-1] * (1 – m)2
populations
diverge under
FST = 0.047
Nm  4.4 immigrants (at equilibrium)
the effet of
genetic drift
In order to maintain high genetic
diversity in small island
populations, some migrants have
to cross the ice every year.
Alternatively the islands have to
be recolonized relatively often
after local extinctions
Several demes: differentiation
 A large population is divided into local demes
 Genetic drift will occur in each deme and make allele frequencies
diverge
 The probability of IBD in each deme will increase
 After t generations on average F = 1 – (1 – 1/2N)t
 When in all demes all individuals are descendent from one
ancestor F = 1
 FST is used as measure for population differentiation
Two etimates:
Qw  Qb
FST 
1  Qb
2
ˆ

B
FˆST    2
ˆ T
Zygosity in the infinite alleles
model
•Any number of distributions of allele
frequency can result in the same homozygosity
•Effective number of alleles
The number of equally frequent alleles that
would be required to produce the same
homozygosity as that actually observed
1
ne 
2
p
i
1
ke 
2
q
i
Population 1:
p = 0.7, q = 0.1, r = 0.1, s = 0.1
ke = 1.92
Population 2:
p = 0.5, q = 0.3, r = 0.2
ke = 2.63
Population 3:
p = 0.6, q = 0.4
ke = 1.92
Population 4:
p = 0.5, q = 0.5
ke = 2.00
Selective neutrality and the infinite alleles model
Amino acid polymorphisms
Are amino acid polymorphisms are selectively neutral?
n
1
  pi2
1  4 Nμ i 1
1
ne  n
 4 Nμ+1
2
p
 i
Fˆ 
i 1
1. Examine allozyme polymorphisms
•Plug in observed homozygosities
•Estimate Nm
m approximately 10-6
N = 1040000
•Reasonable?
2. Compare observed distribution of heterozygosity with an observed distribution
•Observed heterozygosity of 74 genes (shaded bars)
•Expected heterozygosity (open bars)
•Are these different?
Summary
 Genetic drift: In a finite population allele frequencies fluctuate at
random and eventually one allele will be fixed
 After 4N generations all individuals descend from one ancestor
 Genetic diversity is lost more rapidly in small populations
 Inbreeding reduces the number of heterozygotes
 Inbred individuals can have lower fitness: inbreeding depression
 The genetic composition of isolated populations diverges under the
effect of genetic drift
 Gene flow homogenizes allele frequencies among populations
 After a long time, the genetic variability in a population reaches an
equilibrium level: mutation – immigration – drift equilibrium