Molecular phylogenetics IV
Download
Report
Transcript Molecular phylogenetics IV
Molecular phylogenetics 4
Level 3 Molecular Evolution and
Bioinformatics
Jim Provan
Page and Holmes: Sections 6.7-8
Have we got the true tree?
Several approaches developed to answer this question:
Analysis:
– In some cases (e.g. UPGMA) the phylogenetic method is simple
enough that we can establish mathematically the exact conditions
under which it will fail
– Parsimony can fail under particular distribution of edge lengths
Known phylogenies
– Best evidence for success of a tree-building method would be if it
could accurately reconstruct a known phylogeny
– Typically, only “known” phylogenies exist for crop plants and
laboratory animals and even these are often suspect
– Growth of bacteriophage T7 in the presence of mutagens allowed
comparison of tree building methods
Have we got the true tree?
Several approaches (continued):
Simulation:
– Provide software with a tree and “evolve” DNA sequences along
branches according to some model
– Supply the resulting sequences for a range of tree-building
methods and determine which (if any) recover the original tree
– An advantage of this approach is that we can explore the
effects of a wide range of parameters on the performance of
tree reconstruction methods
– A disadvantage is that the models used to generate the new
sequences may be unrealistic, particularly in biasing the model
towards a particular method
The “Felsenstein Zone”
UPGMA
Parsimony
Congruence
Congruence is the agreement between estimates of
phylogeny based on different characters:
If data sets are independent, the probability of obtaining similar
trees is extremely small
Conversely, if different data sets give similar trees then this
suggests that both reflect the same underlying cause, namely
they reflect the same evolutionary history
Two ways of using congruence:
To validate a method of inference: a method that constantly
recovers similar trees from different data sets will be preferred to
a method that produces different trees from different data sets
To validate a new source of data: does a newly sequenced gene
contain phylogenetic information?
Sampling error
If a data set contains homoplasy then different
nucleotide sites support different trees:
Which tree(s) a given data set supports depends on which
characters have been sampled
Estimates of phylogeny based on samples will be accompanied
by sample error
Effects of sampling error evident by comparing trees for
different mitochondrial genes:
Since there is no recombination, all mitochondrial genes share
the same evolutionary history
Several different trees were obtained
Sampling of taxa is also important
Bootstrapping
Bootstrapping is a way of calculating sampling error
without taking repeated samples from the population /
species under study:
Mimics the technique of repeated sampling from the original
population by resampling from the original sample
Each resampling is a pseudoreplicate
Bootstrapping can be applied to phylogenetics by taking
several pseudoreplicates:
Sampling with replacement gives a new data set based on the
original sample:
– Some sites represented more than once
– Some sites not represented at all
Pseudoreplicate can be used to construct a new tree
Bootstrapping
Human
Chimp
Gorilla
Orang-utan
Gibbon
1
T
T
T
C
C
2
C
T
T
C
C
3
C
C
A
A
A
4
T
T
C
C
C
5
T
A
A
A
A
6
A
T
A
A
A
7
A
A
T
A
A
Original tree
8
A
A
A
T
A
9
A
A
A
A
T
2
C
C
A
A
A
7
A
A
T
A
A
7
A
A
T
A
A
3
C
C
A
A
A
1
T
T
T
C
C
7
A
A
T
A
A
4
T
T
C
C
C
9
A
A
A
A
T
Bootstrap tree
6
A
T
A
A
A
Bootstrapping
C
H
G
H
G
C
B
H
B
C
G
O
41/100
B
O
28/100
O
31/100
What can go wrong?
Sampling error:
Almost all phylogenies are based on a sample of some sort
Especially true given the vagaries of homoplasy
Incorrect model of sequence evolution:
All methods make implicit or explicit assumptions about
evolutionary process
Example is problem of base composition:
– An AT rich part of a gene may be more similar to an AT rich
part of a different gene purely by chance
Tree structure:
Evolutionary history is not always simple:
– Rapid cladogenesis
– Widely differing rates of divergence
– Horizontal gene transfer