Transcript 3000_13_3c
PHYLOGENETICS
CONTINUED
TESTS
BY
TUESDAY
BECAUSE
SOME
PROBLEMS
WITH
SCANTRONS
consensus tree
•
•
•
can also ask equallysupported trees (equally
parsimonious, equal
likelihood) how well they all
support same nodes
doesn’t have to involve
subset of data like in
bootstrap
may summarize the stable
parts of tree across 2+ trees
a b c d ea b c d e
a b c d e
CONSENSUS phylogeny
is not fully resolved
where there is disagreement
among equally-ranked trees
blue indicates dinosaurs with bifurcation of neural spine in vertebrae
http://svpow.com/papers-by-sv-powsketeers/wedel-and-taylor-2013-on-sauropod-neural-spinebifurcation/
support for the
method
• do we believe phylogeny reconstruction
works? need to test it against a known
history
• (fish(salamander(bird(mouse,human))
we feel pretty strongly about
• experimental phylogenetics uses virus
evolution to go one step further
experimental
evolution
40
generations
40
generations
40
generations
growing T7 phage
on E. coli plates;
speed up mutation
process by adding
mutagen
experimental
evolution
• so phylogeny is known, and ancestral
strains can be kept in freezer
•
•
•
sequence part of DNA and use parsimony,
likelihood, and other approaches
consistently got the right (TRUE) answer!
can also track “traits” on this tree, e.g.
changes in growth rate and plaque size on
E. coli plates (and check against actual
ancestors)
# DNA
mutations
on this
branch
Rem: each
branch is
40
generation
s
Text: “Because constructing phylogenies, and science more broadly, is often a
process of evaluating evidence, scientists often test the effectiveness of the
methodologies used to draw conclusions.”
case studies
• text goes through Origin of Tetrapods,
Human phylogeny, Darwins finches,
HIV
• show phylogeny, explain the likely
mechanisms for pattern
well-supported
phylogeny of rabies
virus lineages,
coded by host bat
species
Phylogeny: how?
Methods from Streicker et al (2010 bat rabies phylogeny paper)
what gene
region(s)?
PCR of gene
with primers
sampling
effort
how sequence
data generated
Phylogeny: how?
Methods from Streicker et al (2010 bat rabies phylogeny paper)
tree criterion: uses
statistical model of
DNA evolution
every type of
mutation happens at
different rate, as
observed
mutations happen at
different rates
across codons in
protein-coding
genes
are our data
consistently
supporting same
phylogeny?
outgroup comparison ‘roots’ phylogeny at ancestral node
Phylogeny: how?
coalescent: statistical
model of how different
evolutionary histories
Methods from Streicker et al (2010 bat rabies phylogeny paper)
of drift, selection,
migration, and change
in population size are
associated with DATA
oh, no. now it is
getting gnarly.
treat bat species as
locations and ask how
frequently migration of
virus among species
could explain pattern
we see now?
analysis
indicates
rate of virus
jumping from
one host
to another
For RNA viruses, rapid viral evolution and the biological similarity of closely related host species have been proposed as key determinants of the occurrence and long-term
outcome of cross-species transmission. Using a data set of hundreds of rabies viruses sampled from 23 North American bat species, we present a general framework to
quantify per capita rates of cross-species transmission and reconstruct historical patterns of viral establishment in new host species using molecular sequence data. These
estimates demonstrate diminishing
frequencies of both cross-species transmission and host shifts with increasing phylogenetic distance
between bat species. Evolutionary
constraints on viral host range indicate that host species barriers may
trump the intrinsic mutability of RNA viruses in determining the fate of emerging host-virus interactions.
so this study requires TWO
phylogenies (virus and
bats)
CST: cross-species transmission
neutrality
•
•
•
neutral: doesn’t affect
fitness of organism
compare mutations in
protein coding regions:
synonymous mutations
do not change amino
acid, nonsynonymous do
if much of diversity is
neutral (or nearly so),
mutations will
accumulate and fix
(become a substitution)
in populations regularly
through time
“molecular clock” works for many genome partitions
neutrality acts as our NULL HYPOTHESIS
•
•
different homologous genome regions have
different rates, slower rates when more
functional constraints
remember: fossil record,
biogeography/geology, mutation
accumulation studies help us estimate
substitution rate µ
isthmus closes via volcanic
uplift ~3.5mya
two locations - are they
two populations?
different allele frequencies,
distinct clades on tree: yes
compare cytochrome
oxidase mtDNA gene: 7%
divergence
• d=2µt
time(t),
rate
µ
along
2
branches
• µ is the rate of mutations going to
fixation (substitutions), under neutrality
the mutation rate IS the substitution rate
because selection doesn’t accelerate or
halt or change probability of fixation
• here we know t=3,500,000 years,
d=0.07
• µ = d/2t = (0.07)/(7,000,000) = 1x10
• another way to put it, rate of divergence
-8
(2µ) ~2% per million years
• what is our assumption in those slides
about clock calibration?
• how would YOU test that?
• idea is any mutation is equally likely to
become a substitution
• how have we divided (point) mutations
up so far?
neutrality
•
•
•
neutral: doesn’t affect
fitness of organism
compare mutations in
protein coding regions:
synonymous mutations
do not change amino
acid, nonsynonymous do
if much of diversity is
neutral (or nearly so),
mutations will
accumulate and fix
(become a substitution)
in populations regularly
through time
synonymous is
assumed
neutral
• so we can ask if nonsynonymous substitutions
happen at a different rate
•
•
•
neutrality: nonsynonymous divergence (dN) =
synonymous divergence (dS) rate
rate, not number of mutations - remember many
more ways for a mutation to be nonsynonymous
than synonymous
does dN/dS =1? (book, elsewhere often this is
called kA/kS; adjusts for the “more ways” of
nonsynonymy)
This is the dN:dS
or kA:kS approach
we have been discussing
if kA:kS >> 1, change
has been selected FOR
if kA:kS << 1, change
is generally BAD
if kA:kS ~ 1
neutrality
positive selection: amino acid change is favored
functional constraints lead to high levels of homology:
change is generally bad (purifying selection)
region of high homology led to
discovery of new functional region
that influences mammalian heart
disease