Transcript ppt
Lecture 21 : Introduction to Phylogenetics
November 9, 2015
Last Time
Sequence data and quantification of variation
Infinite sites model
Nucleotide diversity (π)
Sequence-based tests of neutrality
Ewens-Watterson Test
Tajima’s D
Today
Signatures of selection
Hudson-Kreitman-Aguade Test
Synonymous versus Nonsynonymous substitutions
McDonald-Kreitman
Molecular clock
Introduction to phylogenetics
Hudson-Kreitman-Aguade (HKA) Test
• Divergence between species should be
proportional to variation within species
(polymorphism)
• Provides a correction factor for mutation rates at
different sites
• Perform test for loci under selection and
supposedly neutral loci
• Loci with less polymorphism than expected are
candidates for selective sweeps within a species
Hudson-Kreitman- Aguade(HKA) test
Typical Gene
Selective Sweep
(Hamilton 266)
Purifying selection in
both lineages
Hudson-Kreitman-Aguade (HKA) test
Neutral Locus
Polymorphism
Divergence
8
3
20
8
Polymorphism: Variation within species
Divergence: Variation between species
Slide adapted from Yoav Gilad
Test Locus A
8/20 ≈ 3/8
Hudson-Kreitman-Aguade (HKA)
test
Neutral Locus
Polymorphism
Divergence
Test Locus B
8
3
20
19
8/20 >> 3/19
Conclusion: polymorphism lower than
expected in Test Locus B: Selective sweep?
Slide adapted from Yoav Gilad
Sequence Evolution
• DNA or protein sequences in different taxa
trace back to a common ancestral sequence
• Divergence of neutral loci is a function of the
combination of mutation and fixation by
genetic drift
• Sequence differences are an index of time
since divergence
Molecular Clock
• If neutrality prevails, nucleotide divergence between two sequences should be
a function entirely of mutation rate
1
k = 2N m
=m
2N
Probability of
creation of new
alleles
Probability of
fixation of new
alleles
Time since divergence should therefore be the reciprocal of the estimated
mutation rate
Expected Time Until Fixation of a New Mutation:
t
1
Since μ is number of
substitutions per unit time
Variation in Molecular Clock
• If neutrality prevails, nucleotide divergence between two sequences should be
a function entirely of mutation rate
So why are rates of substitution so different for different classes of genes?
Using Synonymous Substitutions to Control for
Factors Other Than Selection
dN/dS or Ka/Ks Ratios
Types of Mutations (Polymorphisms)
Synonymous versus
Nonsynonymous SNP
First and second position
SNP often changes
amino acid
UCA, UCU, UCG, and UCC
all code for Serine
Third position SNP often
synonymous
Majority of positions are
nonsynonymous
Not all amino acid
changes affect fitness:
allozymes
Synonymous & Nonsynonymous Substitutions
• Synonymous substitution rate can be used to set neutral
expectation for nonsynonymous rate
• dS is the relative rate of synonymous mutations per
synonymous site
• dN is the relative rate of nonsynonymous mutations per
non-synonymous site
• = dN/dS
– If = 1, neutral selection
– If < 1, purifying selection
– If > 1, positive Darwinian selection
• For human genes, ≈ 0.1
Complications in Estimating dN/dS
Multiple mutations in a codon give CGT(Arg)->AGA(Arg)
multiple possible paths
CGT(Arg)->AGT(Ser)->AGA(Arg)
Two types of nucleotide base
CGT(Arg)->CGA(Arg)->AGA(Arg)
substitutions resulting in SNPs:
transitions and transversions not
equally likely
Back-mutations are invisible
Complex evolutionary models
using likelihood and Bayesian
approaches must be used to
estimate dN/dS (also called KA/KS or
KN/KS depending on method)
(PAML package)
http://www.mun.ca/biology/scarr/Transitions_vs_Transversions.html
dn/ds ratios for 363 mouserat comparisons
Most genes show purifying
selection (dN/dS < 1)
Some evidence of positive
selection, especially in genes
related to immune system
interleukin-3: mast cells and
bone marrow cells in
immune system
Hartl and Clark 2007
McDonald-Kreitman Test
• Conceptually similar to HKA test
• Uses only one gene
• Contrasts ratios of synonymous divergence and
polymorphism to rates of nonsynonymous divergence and
polymorphism
• Gene provides internal control for evolution rates and
demography
Application of McDonald-Kreitman Test:
Aligned 11,624 gene sequences
between human and chimp
Calculated synonymous and
nonsynonymous substitutions
between species (Divergence)
and within humans (SNPs)
Identified 304 genes showing
evidence of positive selection
(blue) and 814 genes showing
purifying selection (red) in
humans
Positive selection: defense/immunity,
apoptosis, sensory perception, and
transcription factors
Purifying selection: structural and
housekeeping genes
Bustamente et al. 2005. Nature 437, 1153-1157
Phylogenetics
Study of the evolutionary relationships among individuals,
groups, or species
Relationships often represented as dichotomous
branching tree
Extremely common approach for detecting and displaying
relationships among genotypes
Important in evolution, systematics, and ecology
(phylogeography)
Evolution
C
A
D
E
B
G
H
I
J
K
L
M
F
N
Slide adapted from Marta Riutart
O
P
Q
R
S
T
U
V
W
X
Y
Z
Ç
What is a phylogeny?
O
P
Q
R
S
T
U
V
W
X
Y
Z
Ç
Homology: similarity that is the result of inheritance from a common ancestor
Slide adapted from Marta Riutart
Phylogenetic Tree Terms
Group, cluster, clade
Leaves, Operational Taxonomic
Units (OTUs)
terminal branches
A
B
C
D
E
F
node
interior
branches
ROOT
Slide adapted from Marta Riutart
G
H
I
J
Tree Topology
Bacteria 1
Bacteria 2
Bacteria 3
Eukaryote 1
Eukaryote 2
Eukaryote 3
Eukaryote 4
(Bacteria1,(Bacteria2,Bacteria3),(Eukaryote1,((Eukaryote2,Eukaryote3),Eukaryote4)))
Bacteria 1
Bacteria 2
Bacteria 3
Eukaryote 1
Slide adapted from Marta Riutart
Eukaryote 2
Eukaryote 3
Eukaryote 4
Are these trees different?
How about these?
http://helix.biology.mcmaster.ca
Rooted versus Unrooted Trees
archaea
eukaryote
archaea
Unrooted tree
archaea
eukaryote
eukaryote
eukaryote
Rooted
by outgroup
bacteria outgroup
archaea
Monophyletic group
archaea
archaea
eukaryote
eukaryote
root
eukaryote
eukaryote
Slide adapted from Marta Riutart
Monophyletic
group
Rooting with D as
outgroup
G
A
F
E
B
D
C
A
B
C
G
E
F
Slide adapted from Marta Riutart
D
G
A
Now with C as outgroup
F
E
B
D
C
A
G
B
E
C
G
F
E
D
F
A
B
D
C
Which of these four trees is different?
Baum et al.
UPGMA Method
Use all pairwise
comparisons to make
dendrogram
UPGMA:Unweighted
Pairwise Groups
Method using
Arithmetic Means
Hierarchically link most
closely related
individuals
Read the Lab 12 Introduction!
Phenetics (distance) vs Cladistics
(discrete character states)
Lowe, Harris, and Ashton 2004
Parsimony Methods
Based on underlying genealogical relationships among alleles
Occam’s Razor: simplest scenario is the most likely
Useful for depicting evolutionary relationships among taxa or
populations
Choose tree that requires
smallest number of steps
(mutations) to produce
observed relationships
Choosing Phylogenetic Trees
MANY possible trees can be
built for a given set of taxa
Very computationally
intensive to choose among
these
Lowe, Harris, and Ashton 2004
UN
(2n 5)!
2 n 3 (n 3)!
RN
(2n 3)!
(2n 3)U n
n2
2 (n 2)!
n=number of taxa
Choosing Phylogenetic Trees
Many algorithms exist for
searching tree space
Local optima are problem:
need to traverse valleys to
get to other peaks
Heuristic search: cut trees up
systematically and
reassemble
Branch and bound: search for
optimal path through tree
space
Felsenstein 2004
9
8
9
10
9
9
9
7
8
11
11
5
Choosing Phylogenetic Trees
If multiple trees equally likely, select majority rule or consensus
Strict consensus is most conservative approach
Bootstrap data matrix (sample with replacement) to determine
robustness of nodes
E
60
Lowe, Harris, and Ashton 2004
A
D F
CB
60
60
Felsenstein 2004