class08Phylogeny

Download Report

Transcript class08Phylogeny

Phylogenetic Trees: Assumptions
• All existing species have a common
ancestor
• Each species is descended from a single
ancestor
• Each speciation gives rise to two derived
species
• This leads to a ‘tree’ topology for geneology
Tree structure
• ‘Leaves’ are existing species
• Interior nodes are hypothetical common
ancestors
• Tree may be rooted or unrooted
• Root (if it exists) is universal ancesestor
gorilla
human bonobo chimp
Limits of these assumptions
• Early life (probably wasn’t nicely
‘packaged’)
• Merger of symbiotes (origin of eukaryotes,
plants)
• Lateral gene transfer
Characteristics
• Relatedness is inferred from features, or
characters
• discrete character data
– example: has feathers, number of fingers
– data forms a n x m matrix
• distance data
– example: sequence edit distance
– data forms an n x n triangular matrix
Ordered or unordered characters
• Differences in characters are assumed to be
the result of transitions from a previous
uniform state
– (unordered) Such transitions may occur
between any two states
– (ordered) They may occur only in a fixed
sequence
Reversals
• Reversal - mutation into the primitive
(ancestral) state
• Reversals are possible but unlikely
• Examples
– nucleic acid sequence
– toes of a mammal
Perfect phylogeny
• Perfect phylogeny
– Each edge in tree = one state transition
– That is, the entire subtree must share this state
• Central problem of character state phylogeny:
– input: set O of n objects. Set c of m characters.
Set r of states.
– Output (y/n): is there a perfect phylogeny for O?
Solve Perfect Phylogeny problem
• Problem 1 • Problem 2
– acgac
– accag
– acgtt
– tccag
– acgat
– aagtt
– aagtt
– atgtc
– atgta
– atgtt
– aaatt
– ttgta
• Problem 3
– aagtt
– ttgtt
– tcctt
– tcaat
– tcagt
– acagt
Complexity of character problem
• Ordered states
– the problem is polynomial
• Unordered states
– If the number of states (r) is unbounded, PPP is
NP-complete.
– If the number of states is small (2, 3, 4) an
algorithm which runs in reasonable time is
known
Relaxing our assumptions
• Expecting perfect phylogeny is unrealistic,
because
– biological data contains errors
– reversals to occur
• Two approaches to relaxation
– minimize reversals (parsimony)
– throw away minimum of offending characters
(compatibility criterion)
Ideal distance matrix problem
• Input: a set O of objects and a (triangular)
matrix M of pairwise distances between
them
• Output: a tree in which the nodes are O and
the paths have weights M
Ideal distance matrix problem
• Input: a set O of objects and a (triangular)
matrix M of pairwise distances between
them
• Output: a tree in which the nodes are O and
the paths have weights M
• Nonideal considerations: reversals,
convergence, error
Relaxed distance matrix problem
• Input: a set O of objects and a pair of
(triangular) matrix Mmin and Mmax of
pairwise distances between them
• Output: a tree in which the nodes are O and
the path weights are bounded by Mmin and
Mmax
Relaxed distance matrix problem
• Input: a set O of objects and a pair of
(triangular) matrix Mmin and Mmax of
pairwise distances between them
• Output: a tree in which the nodes are O and
the path weights are bounded by Mmin and
Mmax
• Tractable only for ultrametric trees (all
leaves equidistant from root)
Neighbor joining
A simple and commonly used heuristic
while (|nodes| > 2)
find nearest neighbors, A, B
substitute with interior node at their midpoint