Transcript ppt

Lecture 21 : Introduction to Phylogenetics
November 9, 2015
Last Time
Sequence data and quantification of variation
 Infinite sites model
 Nucleotide diversity (π)
Sequence-based tests of neutrality
 Ewens-Watterson Test
 Tajima’s D
Today
Signatures of selection
 Hudson-Kreitman-Aguade Test
 Synonymous versus Nonsynonymous substitutions
 McDonald-Kreitman
Molecular clock
Introduction to phylogenetics
Hudson-Kreitman-Aguade (HKA) Test
• Divergence between species should be
proportional to variation within species
(polymorphism)
• Provides a correction factor for mutation rates at
different sites
• Perform test for loci under selection and
supposedly neutral loci
• Loci with less polymorphism than expected are
candidates for selective sweeps within a species
Hudson-Kreitman- Aguade(HKA) test
Typical Gene
Selective Sweep
(Hamilton 266)
Purifying selection in
both lineages
Hudson-Kreitman-Aguade (HKA) test
Neutral Locus
Polymorphism
Divergence
8
3
20
8
Polymorphism: Variation within species
Divergence: Variation between species
Slide adapted from Yoav Gilad
Test Locus A
8/20 ≈ 3/8
Hudson-Kreitman-Aguade (HKA)
test
Neutral Locus
Polymorphism
Divergence
Test Locus B
8
3
20
19
8/20 >> 3/19
Conclusion: polymorphism lower than
expected in Test Locus B: Selective sweep?
Slide adapted from Yoav Gilad
Sequence Evolution
• DNA or protein sequences in different taxa
trace back to a common ancestral sequence
• Divergence of neutral loci is a function of the
combination of mutation and fixation by
genetic drift
• Sequence differences are an index of time
since divergence
Molecular Clock
• If neutrality prevails, nucleotide divergence between two sequences should be
a function entirely of mutation rate
1
k = 2N m
=m
2N
Probability of
creation of new
alleles
Probability of
fixation of new
alleles
 Time since divergence should therefore be the reciprocal of the estimated
mutation rate
Expected Time Until Fixation of a New Mutation:
t
1

Since μ is number of
substitutions per unit time
Variation in Molecular Clock
• If neutrality prevails, nucleotide divergence between two sequences should be
a function entirely of mutation rate
 So why are rates of substitution so different for different classes of genes?
Using Synonymous Substitutions to Control for
Factors Other Than Selection
dN/dS or Ka/Ks Ratios
Types of Mutations (Polymorphisms)
Synonymous versus
Nonsynonymous SNP
 First and second position
SNP often changes
amino acid
 UCA, UCU, UCG, and UCC
all code for Serine
 Third position SNP often
synonymous
 Majority of positions are
nonsynonymous
 Not all amino acid
changes affect fitness:
allozymes
Synonymous & Nonsynonymous Substitutions
• Synonymous substitution rate can be used to set neutral
expectation for nonsynonymous rate
• dS is the relative rate of synonymous mutations per
synonymous site
• dN is the relative rate of nonsynonymous mutations per
non-synonymous site
•  = dN/dS
– If  = 1, neutral selection
– If  < 1, purifying selection
– If  > 1, positive Darwinian selection
• For human genes,  ≈ 0.1
Complications in Estimating dN/dS
 Multiple mutations in a codon give CGT(Arg)->AGA(Arg)
multiple possible paths
CGT(Arg)->AGT(Ser)->AGA(Arg)
 Two types of nucleotide base
CGT(Arg)->CGA(Arg)->AGA(Arg)
substitutions resulting in SNPs:
transitions and transversions not
equally likely
 Back-mutations are invisible
 Complex evolutionary models
using likelihood and Bayesian
approaches must be used to
estimate dN/dS (also called KA/KS or
KN/KS depending on method)
(PAML package)
http://www.mun.ca/biology/scarr/Transitions_vs_Transversions.html
dn/ds ratios for 363 mouserat comparisons
 Most genes show purifying
selection (dN/dS < 1)
 Some evidence of positive
selection, especially in genes
related to immune system
interleukin-3: mast cells and
bone marrow cells in
immune system
Hartl and Clark 2007
McDonald-Kreitman Test
• Conceptually similar to HKA test
• Uses only one gene
• Contrasts ratios of synonymous divergence and
polymorphism to rates of nonsynonymous divergence and
polymorphism
• Gene provides internal control for evolution rates and
demography
Application of McDonald-Kreitman Test:
 Aligned 11,624 gene sequences
between human and chimp
 Calculated synonymous and
nonsynonymous substitutions
between species (Divergence)
and within humans (SNPs)
 Identified 304 genes showing
evidence of positive selection
(blue) and 814 genes showing
purifying selection (red) in
humans
 Positive selection: defense/immunity,
apoptosis, sensory perception, and
transcription factors
 Purifying selection: structural and
housekeeping genes
Bustamente et al. 2005. Nature 437, 1153-1157
Phylogenetics
 Study of the evolutionary relationships among individuals,
groups, or species
 Relationships often represented as dichotomous
branching tree
 Extremely common approach for detecting and displaying
relationships among genotypes
 Important in evolution, systematics, and ecology
(phylogeography)
Evolution
C
A
D
E
B
G
H
I
J
K
L
M
F
N
Slide adapted from Marta Riutart
O
P
Q
R
S
T
U
V
W
X
Y
Z
Ç
What is a phylogeny?
O
P
Q
R
S
T
U
V
W
X
Y
Z
Ç

Homology: similarity that is the result of inheritance from a common ancestor
Slide adapted from Marta Riutart
Phylogenetic Tree Terms
Group, cluster, clade
Leaves, Operational Taxonomic
Units (OTUs)
terminal branches
A
B
C
D
E
F
node
interior
branches
ROOT
Slide adapted from Marta Riutart
G
H
I
J
Tree Topology
Bacteria 1
Bacteria 2
Bacteria 3
Eukaryote 1
Eukaryote 2
Eukaryote 3
Eukaryote 4
(Bacteria1,(Bacteria2,Bacteria3),(Eukaryote1,((Eukaryote2,Eukaryote3),Eukaryote4)))
Bacteria 1
Bacteria 2
Bacteria 3
Eukaryote 1
Slide adapted from Marta Riutart
Eukaryote 2
Eukaryote 3
Eukaryote 4
Are these trees different?
How about these?
http://helix.biology.mcmaster.ca
Rooted versus Unrooted Trees
archaea
eukaryote
archaea
Unrooted tree
archaea
eukaryote
eukaryote
eukaryote
Rooted
by outgroup
bacteria outgroup
archaea
Monophyletic group
archaea
archaea
eukaryote
eukaryote
root
eukaryote
eukaryote
Slide adapted from Marta Riutart
Monophyletic
group
Rooting with D as
outgroup
G
A
F
E
B
D
C
A
B
C
G
E
F
Slide adapted from Marta Riutart
D
G
A
Now with C as outgroup
F
E
B
D
C
A
G
B
E
C
G
F
E
D
F
A
B
D
C
Which of these four trees is different?
Baum et al.
UPGMA Method
Use all pairwise
comparisons to make
dendrogram
UPGMA:Unweighted
Pairwise Groups
Method using
Arithmetic Means
Hierarchically link most
closely related
individuals
Read the Lab 12 Introduction!
Phenetics (distance) vs Cladistics
(discrete character states)
Lowe, Harris, and Ashton 2004
Parsimony Methods
 Based on underlying genealogical relationships among alleles
 Occam’s Razor: simplest scenario is the most likely
 Useful for depicting evolutionary relationships among taxa or
populations
 Choose tree that requires
smallest number of steps
(mutations) to produce
observed relationships
Choosing Phylogenetic Trees
MANY possible trees can be
built for a given set of taxa
Very computationally
intensive to choose among
these
Lowe, Harris, and Ashton 2004
UN 
(2n  5)!
2 n 3 (n  3)!
RN 
(2n  3)!
 (2n  3)U n
n2
2 (n  2)!
n=number of taxa
Choosing Phylogenetic Trees
Many algorithms exist for
searching tree space
Local optima are problem:
need to traverse valleys to
get to other peaks
Heuristic search: cut trees up
systematically and
reassemble
Branch and bound: search for
optimal path through tree
space
Felsenstein 2004
9
8
9
10
9
9
9
7
8
11
11
5
Choosing Phylogenetic Trees

If multiple trees equally likely, select majority rule or consensus

Strict consensus is most conservative approach

Bootstrap data matrix (sample with replacement) to determine
robustness of nodes
E
60
Lowe, Harris, and Ashton 2004
A
D F
CB
60
60
Felsenstein 2004