Transcript ppt
Lecture 13: Population Structure
February 24, 2014
Last Time
Effective population size calculations
Historical importance of drift: shifting
balance or noise?
Population structure
Today
The F-Statistics
Sample calculations of FST
Defining populations on genetic criteria
Wahlund Effect
Hartl and Clark 1997
Trapped mice will always be homozygous even though HE = 0.5
What happens if you remove the cats and
the mice begin randomly mating?
F-Coefficients
Quantification of the structure of genetic variation in
populations: population structure
Partition variation to the Total Population (T), Subpopulations
(S), and Individuals (I)
T
S
F-Coefficients and Deviations from Expected
Heterozygosity
Recall the fixation index from inbreeding lectures and lab:
F 1
HO
HE
Rearranging:
HO = H E (1- F)
Within a subpopulation:
H I H S (1 F IS )
FIS: deviation from H-W proportions in subpopulation
F-Coefficients and Deviations from Expected
Heterozygosity
H I H S (1 F IS )
FIS: deviation from H-W proportions in subpopulation
H S H T (1 FST )
FST: genetic differention over subpopulations
H I H T (1 F IT )
FIT: deviation from H-W proportions in the total population
F-Coefficients
Combine different sources of reduction in expected
heterozygosity into one equation:
1 F IT (1 FST )(1 F IS )
Overall
deviation from
H-W
expectations
Deviation due to
subpopulation
differentiation
Deviation due to
inbreeding within
populations
F-Coefficients
Quantification of the structure of genetic variation in
populations: population structure
Partition variation to the Total Population (T), Subpopulations
(S), and Individuals (I)
T
S
F-Coefficients
Combine different sources of reduction in expected
heterozygosity into one equation:
1 F IT (1 FST )(1 F IS )
Overall
deviation from
H-W
expectations
Deviation due to
subpopulation
differentiation
Deviation due to
inbreeding within
populations
F-Coefficients and IBD
View F-statistics as probability of Identity by Descent for
different samples
1 F IT (1 FST )(1 F IS )
Overall
probability
of IBD
Probability of IBD
for 2 individuals
in a
subpopulation
Probability of IBD
within an
individual
F-Statistics Can Measure Departures from Expected
Heterozygosity Due to Wahlund Effect
where
F ST
F IS
F IT
HT HS
HT
HS HI
HS
HT H I
HT
HT is the average expected
heterozygosity in the total population
HS is the average expected
heterozygosity in
subpopulations
HI is observed
heterozygosity within a
subpopulation
Calculating FST
Recessive allele for flower color
B2B2 = white; B1B1 and B1B2 = dark pink
Subpopulation 1:
F(white) = 10/20 = 0.5
F(B2)1 = q1= 0.5 = 0.707
White: 10, Dark: 10
p1=1-0.707 = 0.293
Subpopulation 2:
F(white)=2/20=0.1
F(B2)2 = q2 = 0.1 = 0.32
p2 = 1-0.32 = 0.68
White: 2, Dark: 18
Calculating FST
Calculate Average HE of Subpopulations (HS)
For 2 subpopulations:
HS = Σ2piqi/2 = (2(0.707)(0.293) + 2(0.32)(0.68))/2
HS= 0.425
White: 10, Dark: 10
Calculate Average HE for Merged
Subpopulations (HT):
F(white) = 12/40 = 0.3
q = 0.3 = 0.55; p=0.45
HT = 2pq = 2(0.55)(0.45)
HT = 0.495
White: 2, Dark: 18
Bottom Line:
FST = (HT-HS)/HT =
(0.495 - 0.425)/ 0.495 = 0.14
14% of the total variation in flower
color alleles is due to variation among White: 10, Dark: 10
populations
AND
Expected heterozygosity is increased
14% when subpopulations are merged
(Wahlund Effect)
White: 2, Dark: 18
Nei's Gene Diversity: GST
Nei's generalization of FST to multiple, multiallelic loci
G ST
D ST
D ST H T H S
HT
Where HS is mean HE of m subpopulations, calculated for n alleles with
frequency of pj
HS
1
m
(1
m
i 1
HT =1- å P
2
j
n
2
j
p )
j 1
Where pj is mean allele frequency
of allele j over all subpopulations
Unbiased Estimate of FST
Weir and Cockerham's (1984) Theta
Compensates for sampling error, which can cause large biases in
FST or GST (e.g., if sample represents different proportions of
populations)
Calculated in terms of correlation coefficients
Calculated by FSTAT software:
http://www2.unil.ch/popgen/softwares/fstat.htm
Goudet, J. (1995). "FSTAT (Version 1.2): A computer program to calculate Fstatistics." Journal of Heredity 86(6): 485-486.
Often simply referred to as FST in the literature
Weir, B.S. and C.C. Cockerham. 1984. Estimating F-statistics for the analysis of population
structure. Evolution 38:1358-1370.
Linanthus parryae population structure
Annual plant in Mojave desert is classic example of
migration vs drift
Allele for blue flower color is recessive
Use F-statistics to partition variation among regions,
subpopulations, and individuals
FST can be calculated for any hierarchy:
FRT: Variation due to differentiation of regions
FSR: Variation due to differentiation among
subpopulations within regions
Schemske and Bierzychudek 2007 Evolution
Linanthus parryae population structure
ö
1 30 æ
2
H S = åç1- å pim ÷
30 i=1 è m=1 ø
æ
ö
1 3
2
HR =
N r ç1- å prm ÷
å
å Nr r=1 è m=1 ø
r
H T =1- å p m
2
m
F SR
FSR =
F RT
FRT =
F ST
FST =
Hartl and Clark 2007
HR HS
HR
0.1589 - 0.1424
= 0.1036
0.1589
HT H R
HT
0.2371- 0.1589
= 0.3299
0.2371
HT HS
HT
0.2371- 0.1424
= 0.3993
0.2371
FST as Variance Partitioning
Think of FST as proportion of genetic variation partitioned among
populations
F ST
V (q )
where
pq
V(q) is variance of q across
subpopulations
Denominator is maximum amount of variance that could occur
among subpopulations
Analysis of Molecular Variance (AMOVA)
Analogous to Analysis of Variance (ANOVA)
Use pairwise genetic distances as ‘response’
Test significance using permutations
Partition genetic diversity into different
hierarchical levels, including regions,
subpopulations, individuals
Many types of marker data can be used
Method of choice for dominant markers, sequence,
and SNP
Phi Statistics from AMOVA
a
2
CT
a b c
2
2
2
b
2
SC
ST
Correlation of random pairs of haplotypes drawn
from an individual subpopulation relative to pairs
drawn from a region (FSR)
b c
2
2
2
a
2
b
a b c
2
2
Correlation of random pairs of haplotypes
drawn from a region relative to pairs
drawn from the whole population (FRT)
2
Correlation of random pairs of haplotypes drawn
from an individual subpopulation relative to pairs
drawn from the whole population (FST)
http://www.bioss.ac.uk/smart/unix/mamova/slides/frames.htm
What if you don’t know how your
samples are organized into populations
(i.e., you don’t know how many source
populations you have)?
What if reference samples aren’t from
a single population? What if they are
offspring from parents coming from
different source populations
(admixture)?
What’s a population anyway?
Defining populations on genetic criteria
Assume subpopulations are at HardyWeinberg Equilibrium and linkage
equilibrium
Probabilistically ‘assign’ individuals to
populations to minimize departures
from equilibrium
Can allow for admixture (individuals
with different proportions of each
population) and geographic information
Bayesian approach using Monte-Carlo
Markov Chain method to explore
parameter space
Implemented in STRUCTURE program:
http://pritch.bsd.uchicago.edu/structure.html
Londo and Schaal 2007 Mol Ecol 16:4523
Example: Taita Thrush data*
Three main sampling locations in Kenya
Low migration rates (radio-tagging study)
155 individuals, genotyped at 7 microsatellite loci
Slide courtesy of Jonathan Pritchard
Estimating K
Structure is run separately at different values of K. The
program computes a statistic that measures the fit of each
value of K (sort of a penalized likelihood); this can be
used to help select K.
Assumed
value of K
Taita thrush data
1
2
3
4
5
Posterior
probability of K
~0
~0
0.993
0.007
0.00005
Another method for inference of K
The K method of Evanno et al. (2005, Mol. Ecol. 14:
2611-2620):
Eckert, Population Structure, 5-Aug-2008 46
Inferred population structure
Africans Europeans MidEast
Cent/S Asia
Asia
Oceania America
Each individual is a thin vertical line that is partitioned into K colored segments
according to its membership coefficients in K clusters.
Rosenberg et al. 2002 Science 298: 2381-2385
Inferred population structure – regions
Rosenberg et al. 2002 Science 298: 2381-2385