GLYPHOSATE RESISTANCE Background / Problem

Download Report

Transcript GLYPHOSATE RESISTANCE Background / Problem

Lecture 22: Signatures of
Selection and Introduction to
Linkage Disequilibrium
November 12, 2012
Last Time
Sequence data and quantification of
variation
 Infinite sites model
 Nucleotide diversity (π)
Sequence-based tests of neutrality
 Tajima’s D
 Hudson-Kreitman-Aguade
 Synonymous versus Nonsynonymous substitutions
 McDonald-Kreitman
Today
Signatures of selection based on
synonymous and nonsynonymous
substitutions
Multiple loci and independent segregation
Estimating linkage disequilibrium
Using Synonymous Substitutions to
Control for Factors Other Than
Selection
dN/dS or Ka/Ks Ratios
Types of Mutations (Polymorphisms)
Synonymous versus
Nonsynonymous SNP
 First and second
position SNP often
changes amino acid
 UCA, UCU, UCG, and
UCC all code for
Serine
 Third position SNP
often synonymous
 Majority of positions
are nonsynonymous
 Not all amino acid
changes affect
fitness: allozymes
Synonymous & Nonsynonymous Substitutions
Synonymous substitution rate can be used to
set neutral expectation for nonsynonymous rate
dS is the relative rate of synonymous mutations
per synonymous site
dN is the relative rate of nonsynonymous
mutations per non-synonymous site
 = dN/dS
If  = 1, neutral selection
If  < 1, purifying selection
If  > 1, positive Darwinian selection
For human genes,  ≈ 0.1
Complications in Estimating dN/dS
 Multiple mutations in a codon CGT(Arg)->AGA(Arg)
give multiple possible paths
 Two types of nucleotide
base substitutions resulting
in SNPs: transitions and
transversions not equally
likely
CGT(Arg)->AGT(Ser)->AGA(Arg)
CGT(Arg)->CGA(Arg)->AGA(Arg)
 Back-mutations are invisible
 Complex evolutionary models
using likelihood and Bayesian
approaches must be used to
estimate dN/dS (also called
KA/KS or KN/KS depending on
method) (PAML package)
http://www.mun.ca/biology/scarr/Transitions_vs_Transversions.html
dn/ds ratios for 363 mouse-rat comparisons
 Most genes show purifying
selection (dN/dS < 1)
 Some evidence of positive
selection, especially in genes
related to immune system
interleukin-3: mast cells and
bone marrow cells in
immune system
Hartl and Clark 2007
McDonald-Kreitman Test
Conceptually similar to HKA test
Uses only one gene
Contrasts ratios of synonymous divergence and
polymorphism to rates of nonsynonymous
divergence and polymorphism
Gene provides internal control for evolution
rates and demography
Application of McDonaldKreitman Test:
 Aligned 11,624 gene
sequences between human
and chimp
 Calculated synonymous and
nonsynonymous
substitutions between
species (Divergence) and
within humans (SNPs)
 Identified 304 genes
showing evidence of
positive selection (blue)
and 814 genes showing
purifying selection (red) in
humans
 Positive selection: defense/immunity,
apoptosis, sensory perception, and
transcription factors
 Purifying selection: structural and
housekeeping genes
Bustamente et al. 2005. Nature 437, 1153-1157
Genes showing purifying (red) or positive (blue) selection in the human genome
based on the McDonald-Kreitman Test
Bustamente et al. 2005. Nature 437, 1153-1157
How can you differentiate between
effects of selection and demographic
effects on sequence variation?
Will this work for organellar DNA?
Extending to Multiple Loci
 So far, only considering dynamics of alleles at single loci
 Loci occur on chromosomes, linked to other loci!
“The fitness of a single locus ripped from its interactive context is about as
relevant to real problems of evolutionary genetics as the study of the
psychology of individuals isolated from their social context is to an
understanding of man’s sociopolitical evolution”
Richard Lewontin (quoted in Hedrick 2005)
 Size of region that must be considered depends on Linkage
Disequilibrium
Gametic (Linkage) Disequilibrium (LD)
Nonrandom association of alleles at different loci
into gametes
Haplotype: Genotype of a group of closely linked
loci
LD is a major factor in evolution
LD itself provides insights into population history
Estimation of LD is critical for ALL population
genetic data
Nomenclature and concepts
 Two loci, two alleles
 Frequency of allele i at locus 1 is pi
 Frequency of allele i at locus 2 is qi
p1
A1
B1
q1
p2
A2
B2
q2
n
n
 p  q
i 1
i
i 1
i
1
Nomenclature and concepts
Genotype is written as
A1B1 A2B2
A1
B1
A2
B2
A1 and B1 are in coupling phase
A1 and B2 are in repulsion phase
Gametic Disequilibrium
 Easiest to think about physically linked loci, but
not necessarily the case
A1B1 A2B2
Meiosis
A1B1
A1B2
A2B1
A2B2
What Are
of Gametes
in a
p1q1 Expected
p1q2 Frequencies
p2q1
p2q2
Population Under Independent Assortment?
What are expected frequency of Gametes
with complete linkage?
p1
A1
B1
q1
p2
A2
B2
q2
A1B1 A2B2
Meiosis
A1B1
x11
A1B2
A2B1
A2B2
x12
x21
x22
Linkage disequilibrium measure, D
Independent
Assortment:
With LD:
Substituting from
above table:
D  x11 x22  x12 x21
Problem: D is sensitive to allele frequencies
 Can’t have negative gamete
frequencies
 Maximum D set by allele
frequencies
Solution: D' = D/Dmax
ranges from -1 to 1
Example, if D is positive:
p1=0.5, q2=0.5,
Dmax=0.25
but
p1=0.1, q2=0.9,
Dmax=0.09
Dmax Calculation:
If D is positive, Dmax is lesser of
p1q2 or p2q1
If D is negative, Dmax is lesser of
p1q1 or p2q2
LD can also be estimated as correlation
between alleles
r
2
D
p1 p2 q1q2
 r can also be standardized to a -1 to 1 scale
 It is equivalent to D’ in this case
r' 
D
p1 p2 q1q2
 D'
Dmax
p1 p2 q1q2
Recombination
Shuffling of parental alleles during meiosis
A1B1 A2B2
A1
B1
A1
B2
A2
B2
A2
B1
Occurs for unlinked loci and linked loci
Rate of recombination for linked markers
is partially a function of physical
distance
What is the expected recombination
rate for unlinked loci?
A1B1 A2B2
Meiosis
A1B1
Coupling
nr
c
nr  nc
A1B2
A2B1
A2B2
Repulsion
Repulsion
Coupling
Where nr is number of repulsion phase gametes, and
nc is number of coupling phase gametes
LD is partially a function of
recombination rate
 Expected proportions of gametes produced by
various genotypes over two generations
First generation
Where c is the recombination rate
and D0 is the initial amount of LD
(Second generation)
Recombination degrades LD over time
D1  x'11 x'22  x'12 x'21
 ( x11  cD0 )(x22  cD0 )  ( x12  cD0 )(x21  cD0 )
D1  (1  c) D0
Dt  (1  c) D0
t
ct
Dt  e D0
Where t is time (in generations) and
e is base of natural log (2.718)
Effects of recombination rate on LD
 Decline in LD over time
with different
theoretical
recombination rates (c)
 Even with independent
segregation (c=0.5),
multiple generations
required to break up
allelic associations
 Genome-wide linkage
disequilibrium can be
caused by demographic
factors (more later)