Transcript ppt

Lecture 13: Population Structure
February 24, 2014
Last Time
Effective population size calculations
Historical importance of drift: shifting
balance or noise?
Population structure
Today
The F-Statistics
Sample calculations of FST
Defining populations on genetic criteria
Wahlund Effect
Hartl and Clark 1997
Trapped mice will always be homozygous even though HE = 0.5
What happens if you remove the cats and
the mice begin randomly mating?
F-Coefficients
 Quantification of the structure of genetic variation in
populations: population structure
 Partition variation to the Total Population (T), Subpopulations
(S), and Individuals (I)
T
S
F-Coefficients and Deviations from Expected
Heterozygosity
 Recall the fixation index from inbreeding lectures and lab:
F  1
HO
HE
 Rearranging:
HO = H E (1- F)
 Within a subpopulation:
H I  H S (1  F IS )
 FIS: deviation from H-W proportions in subpopulation
F-Coefficients and Deviations from Expected
Heterozygosity
H I  H S (1  F IS )
 FIS: deviation from H-W proportions in subpopulation
H S  H T (1  FST )
 FST: genetic differention over subpopulations
H I  H T (1  F IT )
 FIT: deviation from H-W proportions in the total population
F-Coefficients
 Combine different sources of reduction in expected
heterozygosity into one equation:
1  F IT  (1  FST )(1  F IS )
Overall
deviation from
H-W
expectations
Deviation due to
subpopulation
differentiation
Deviation due to
inbreeding within
populations
F-Coefficients
 Quantification of the structure of genetic variation in
populations: population structure
 Partition variation to the Total Population (T), Subpopulations
(S), and Individuals (I)
T
S
F-Coefficients
Combine different sources of reduction in expected
heterozygosity into one equation:
1  F IT  (1  FST )(1  F IS )
Overall
deviation from
H-W
expectations
Deviation due to
subpopulation
differentiation
Deviation due to
inbreeding within
populations
F-Coefficients and IBD
 View F-statistics as probability of Identity by Descent for
different samples
1  F IT  (1  FST )(1  F IS )
Overall
probability
of IBD
Probability of IBD
for 2 individuals
in a
subpopulation
Probability of IBD
within an
individual
F-Statistics Can Measure Departures from Expected
Heterozygosity Due to Wahlund Effect
where
F ST 
F IS 
F IT 
HT  HS
HT
HS  HI
HS
HT  H I
HT
HT is the average expected
heterozygosity in the total population
HS is the average expected
heterozygosity in
subpopulations
HI is observed
heterozygosity within a
subpopulation
Calculating FST
Recessive allele for flower color
B2B2 = white; B1B1 and B1B2 = dark pink
Subpopulation 1:
F(white) = 10/20 = 0.5
F(B2)1 = q1= 0.5 = 0.707
White: 10, Dark: 10
p1=1-0.707 = 0.293
Subpopulation 2:
F(white)=2/20=0.1
F(B2)2 = q2 = 0.1 = 0.32
p2 = 1-0.32 = 0.68
White: 2, Dark: 18
Calculating FST
Calculate Average HE of Subpopulations (HS)
For 2 subpopulations:
HS = Σ2piqi/2 = (2(0.707)(0.293) + 2(0.32)(0.68))/2
HS= 0.425
White: 10, Dark: 10
Calculate Average HE for Merged
Subpopulations (HT):
F(white) = 12/40 = 0.3
q = 0.3 = 0.55; p=0.45
HT = 2pq = 2(0.55)(0.45)
HT = 0.495
White: 2, Dark: 18
Bottom Line:
FST = (HT-HS)/HT =
(0.495 - 0.425)/ 0.495 = 0.14
 14% of the total variation in flower
color alleles is due to variation among White: 10, Dark: 10
populations
AND
 Expected heterozygosity is increased
14% when subpopulations are merged
(Wahlund Effect)
White: 2, Dark: 18
Nei's Gene Diversity: GST
Nei's generalization of FST to multiple, multiallelic loci
G ST 
D ST
D ST  H T  H S
HT
Where HS is mean HE of m subpopulations, calculated for n alleles with
frequency of pj
HS 
1
m
(1  

m
i 1
HT =1- å P
2
j
n
2
j
p )
j 1
Where pj is mean allele frequency
of allele j over all subpopulations
Unbiased Estimate of FST
 Weir and Cockerham's (1984) Theta
 Compensates for sampling error, which can cause large biases in
FST or GST (e.g., if sample represents different proportions of
populations)
 Calculated in terms of correlation coefficients
Calculated by FSTAT software:
http://www2.unil.ch/popgen/softwares/fstat.htm
Goudet, J. (1995). "FSTAT (Version 1.2): A computer program to calculate Fstatistics." Journal of Heredity 86(6): 485-486.
Often simply referred to as FST in the literature
Weir, B.S. and C.C. Cockerham. 1984. Estimating F-statistics for the analysis of population
structure. Evolution 38:1358-1370.
Linanthus parryae population structure
Annual plant in Mojave desert is classic example of
migration vs drift
Allele for blue flower color is recessive
Use F-statistics to partition variation among regions,
subpopulations, and individuals
FST can be calculated for any hierarchy:
FRT: Variation due to differentiation of regions
FSR: Variation due to differentiation among
subpopulations within regions
Schemske and Bierzychudek 2007 Evolution
Linanthus parryae population structure
ö
1 30 æ
2
H S = åç1- å pim ÷
30 i=1 è m=1 ø
æ
ö
1 3
2
HR =
N r ç1- å prm ÷
å
å Nr r=1 è m=1 ø
r
H T =1- å p m
2
m
F SR 
FSR =
F RT 
FRT =
F ST 
FST =
Hartl and Clark 2007
HR  HS
HR
0.1589 - 0.1424
= 0.1036
0.1589
HT  H R
HT
0.2371- 0.1589
= 0.3299
0.2371
HT  HS
HT
0.2371- 0.1424
= 0.3993
0.2371
FST as Variance Partitioning
 Think of FST as proportion of genetic variation partitioned among
populations
F ST 
V (q )
where
pq
V(q) is variance of q across
subpopulations
 Denominator is maximum amount of variance that could occur
among subpopulations
Analysis of Molecular Variance (AMOVA)
Analogous to Analysis of Variance (ANOVA)
 Use pairwise genetic distances as ‘response’
 Test significance using permutations
Partition genetic diversity into different
hierarchical levels, including regions,
subpopulations, individuals
Many types of marker data can be used
 Method of choice for dominant markers, sequence,
and SNP
Phi Statistics from AMOVA
a
2
 CT 
a b c
2
2
2
b
2
 SC 
 ST 
Correlation of random pairs of haplotypes drawn
from an individual subpopulation relative to pairs
drawn from a region (FSR)
b c
2
2
 
2
a
2
b
a b c
2
2
Correlation of random pairs of haplotypes
drawn from a region relative to pairs
drawn from the whole population (FRT)
2
Correlation of random pairs of haplotypes drawn
from an individual subpopulation relative to pairs
drawn from the whole population (FST)
http://www.bioss.ac.uk/smart/unix/mamova/slides/frames.htm
What if you don’t know how your
samples are organized into populations
(i.e., you don’t know how many source
populations you have)?
What if reference samples aren’t from
a single population? What if they are
offspring from parents coming from
different source populations
(admixture)?
What’s a population anyway?
Defining populations on genetic criteria
 Assume subpopulations are at HardyWeinberg Equilibrium and linkage
equilibrium
 Probabilistically ‘assign’ individuals to
populations to minimize departures
from equilibrium
 Can allow for admixture (individuals
with different proportions of each
population) and geographic information
 Bayesian approach using Monte-Carlo
Markov Chain method to explore
parameter space
 Implemented in STRUCTURE program:
http://pritch.bsd.uchicago.edu/structure.html
Londo and Schaal 2007 Mol Ecol 16:4523
Example: Taita Thrush data*
 Three main sampling locations in Kenya
 Low migration rates (radio-tagging study)
 155 individuals, genotyped at 7 microsatellite loci
Slide courtesy of Jonathan Pritchard
Estimating K
Structure is run separately at different values of K. The
program computes a statistic that measures the fit of each
value of K (sort of a penalized likelihood); this can be
used to help select K.
Assumed
value of K
Taita thrush data
1
2
3
4
5
Posterior
probability of K
~0
~0
0.993
0.007
0.00005
Another method for inference of K
 The K method of Evanno et al. (2005, Mol. Ecol. 14:
2611-2620):
Eckert, Population Structure, 5-Aug-2008 46
Inferred population structure
Africans Europeans MidEast
Cent/S Asia
Asia
Oceania America
Each individual is a thin vertical line that is partitioned into K colored segments
according to its membership coefficients in K clusters.
Rosenberg et al. 2002 Science 298: 2381-2385
Inferred population structure – regions
Rosenberg et al. 2002 Science 298: 2381-2385