GLYPHOSATE RESISTANCE Background / Problem
Download
Report
Transcript GLYPHOSATE RESISTANCE Background / Problem
Lecture 13: Population Structure
October 8, 2012
Last Time
Effective population size
calculations
Historical importance of drift:
shifting balance or noise?
Population structure
Today
Course feedback
The F-Statistics
Sample calculations of FST
Defining populations on genetic criteria
Midterm Course Evaluations
Based on five responses: It’s not too late
to have an impact!
Lectures are generally OK
Labs are valuable, but better
organization and more feedback are
needed
Difficulty level is OK
Book is awful
F-Coefficients
Quantification of the structure of genetic variation
in populations: population structure
Partition variation to the Total Population (T),
Subpopulations (S), and Individuals (I)
T
S
F-Coefficients
Combine different sources of reduction in expected
heterozygosity into one equation:
1 FIT (1 FST )(1 FIS )
Overall
deviation
from H-W
expectations
Deviation due
to
subpopulation
differentiation
Deviation due
to inbreeding
within
populations
F-Coefficients and IBD
View F-statistics as probability of Identity by
Descent for different samples
1 FIT (1 FST )(1 FIS )
Overall
probability
of IBD
Probability of
IBD for 2
individuals in a
subpopulation
Probability of
IBD within an
individual
F-Statistics Can Measure Departures from
Expected Heterozygosity Due to Wahlund Effect
where
HT H S
FST
HT
HT is the average expected
HS HI
FIS
HS
HS is the average
HT H I
FIT
HT
heterozygosity in the total
population
expected heterozygosity
in subpopulations
HI is observed
heterozygosity within
a subpopulation
Calculating FST
Recessive allele for flower color
B2B2 = white; B1B1 and B1B2 = dark pink
Subpopulation 1:
F(white) = 10/20 = 0.5
F(B2)1 = q1= 0.5 = 0.707
White: 10, Dark: 10
p1=1-0.707 = 0.293
Subpopulation 2:
F(white)=2/20=0.1
F(B2)2 = q2 = 0.1 = 0.32
p2 = 1-0.32 = 0.68
White: 2, Dark: 18
Calculating FST
Calculate Average HE of Subpopulations (HS)
For 2 subpopulations:
HS = Σ2piqi/2 = (2(0.707)(0.293) + 2(0.32)(0.68))/2
HS= 0.425
White: 10, Dark: 10
Calculate Average HE for Merged
Subpopulations (HT):
F(white) = 12/40 = 0.3
q = 0.3 = 0.55; p=0.45
HT = 2pq = 2(0.55)(0.45)
HT = 0.495
White: 2, Dark: 18
Bottom Line:
FST = (HT-HS)/HT =
(0.495 - 0.425)/ 0.495 = 0.14
14% of the total variation in flower
color alleles is due to variation among White: 10, Dark: 10
populations
AND
Expected heterozygosity is increased
14% when subpopulations are merged
(Wahlund Effect)
White: 2, Dark: 18
Nei's Gene Diversity: GST
Nei's generalization of FST to multiple, multiallelic loci
GST
DST
HT
DST HT H S
Where HS is mean HE of m subpopulations, calculated for n alleles
with frequency of pj
m
n
1
2
HS (1 p j )
m i 1
j 1
HT =1- å P
2
j
Where pj is mean allele
frequency of allele j over all
subpopulation
Unbiased Estimate of FST
Weir and Cockerham's (1984) Theta
Compensates for sampling error, which can cause large
biases in FST or GST (e.g., if sample represents
different proportions of populations)
Calculated in terms of correlation coefficients
Calculated by FSTAT software:
http://www2.unil.ch/popgen/softwares/fstat.htm
Goudet, J. (1995). "FSTAT (Version 1.2): A computer program to
calculate F- statistics." Journal of Heredity 86(6): 485-486.
Often simply referred to as FST in the literature
Weir, B.S. and C.C. Cockerham. 1984. Estimating F-statistics for the analysis
of population structure. Evolution 38:1358-1370.
Linanthus parryae population structure
Annual plant in Mojave desert is classic
example of migration vs drift
Allele for blue flower color is recessive
Use F-statistics to partition variation among
regions, subpopulations, and individuals
FST can be calculated for any hierarchy:
FRT: Variation due to differentiation of regions
FSR: Variation due to differentiation among
subpopulations within regions
Schemske and Bierzychudek 2007 Evolution
Linanthus parryae population structure
ö
1 30 æ
2
H S = åç1- å pim ÷
30 i=1 è m=1 ø
æ
ö
1 3
2
HR =
N r ç1- å prm ÷
å
å Nr r=1 è m=1 ø
r
æ
ö
2
H T = 2 ç1- å pm ÷
è m ø
H HS
FSR R
HR
0.1589 - 0.1424
FSR =
= 0.1036
0.1589
HT H R
FRT
HT
FRT =
0.2371- 0.1589
= 0.3299
0.2371
FST
HT H S
HT
FST =
0.2371- 0.1424
= 0.3993
0.2371
Hartl and Clark 2007
FST as Variance Partitioning
Think of FST as proportion of genetic variation
partitioned among populations
V (q)
FST
pq
where
V(q) is variance of q across
subpopulations
Denominator is maximum amount of variance that could
occur among subpopulations
Analysis of Molecular Variance (AMOVA)
Analogous to Analysis of Variance (ANOVA)
Use pairwise genetic distances as ‘response’
Test significance using permutations
Partition genetic diversity into different
hierarchical levels, including regions,
subpopulations, individuals
Many types of marker data can be used
Method of choice for dominant markers, sequence,
and SNP
Phi Statistics from AMOVA
CT
2
a c2
2
a
2
b
b2
SC 2
b c2
Correlation of random pairs of haplotypes
drawn from an individual subpopulation
relative to pairs drawn from a region (FSR)
ST 2
a b2 c2
2
a
Correlation of random pairs of
haplotypes drawn from a region
relative to pairs drawn from the
whole population (FRT)
2
b
Correlation of random pairs of haplotypes
drawn from an individual subpopulation
relative to pairs drawn from the whole
population (FST)
http://www.bioss.ac.uk/smart/unix/mamova/slides/frames.htm
What if you don’t know how your
samples are organized into
populations (i.e., you don’t know
how many source populations you
have)?
What if reference samples aren’t
from a single population? What if
they are offspring from parents
coming from different source
populations (admixture)?
What’s a population anyway?
Defining populations on genetic criteria
Assume subpopulations are at
Hardy-Weinberg Equilibrium and
linkage equilibrium
Probabilistically ‘assign’
individuals to populations to
minimize departures from
equilibrium
Can allow for admixture
(individuals with different
proportions of each population)
and geographic information
Bayesian approach using MonteCarlo Markov Chain method to
explore parameter space
Implemented in STRUCTURE
program:
Londo and Schaal 2007 Mol Ecol 16:4523
Example: Taita Thrush data*
Three main sampling locations in Kenya
Low migration rates (radio-tagging study)
155 individuals, genotyped at 7 microsatellite loci
Slide courtesy of Jonathan Pritchard
Estimating K
Structure is run separately at different values of K. The
program computes a statistic that measures the fit of each
value of K (sort of a penalized likelihood); this can be
used to help select K.
Assumed
value of K
Taita thrush data
1
2
3
4
5
Posterior
probability of K
~0
~0
0.993
0.007
0.00005
Another method for inference of K
The K method of Evanno et al. (2005, Mol.
Ecol. 14: 2611-2620):
Eckert, Population Structure, 5-Aug-2008 46
Inferred population structure
Africans Europeans MidEast
Cent/S Asia
Asia
Oceania America
Each individual is a thin vertical line that is partitioned into K colored
segments according to its membership coefficients in K clusters.
Rosenberg et al. 2002 Science 298: 2381-2385
Inferred population structure – regions
Rosenberg et al. 2002 Science 298: 2381-2385