Phylogenetic trees
Download
Report
Transcript Phylogenetic trees
Phylogenies
Preliminaries
Distance-based methods
Parsimony Methods
Phylogenetic Trees
Hypothesis about the relationship between
organisms
Can be rooted or unrooted
A
B
B
C
D
E
Time
A
E
C
D
Root
CS/BIO 271 - Introduction to Bioinformatics
2
Tree proliferation
2n 3!
N R n2
2 n 2 !
NU
2n 5!
n 3
2 n 3!
Species Number of Rooted Trees
Number of Unrooted Trees
2
1
1
3
3
1
4
15
3
5
105
15
6
34,459,425
2,027,025
7
213,458,046,767,875
7,905,853,580,625
8
8,200,794,532,637,891,559,375
221,643,095,476,699,771,875
CS/BIO 271 - Introduction to Bioinformatics
3
Molecular phylogenetics
Specific genomic
sequence variations
(alleles) are much
more reliable than
phenotypic
characteristics
More than one gene
should be considered
CS/BIO 271 - Introduction to Bioinformatics
4
An ongoing didactic
Pheneticists tend to prefer distance based
metrics, as they emphasize relationships among
data sets, rather than the paths they have taken
to arrive at their current states.
Cladists are generally more interested in
evolutionary pathways, and tend to prefer more
evolutionarily based approaches such as
maximum parsimony.
CS/BIO 271 - Introduction to Bioinformatics
5
Distance matrix methods
Species
B
A
9
B
–
C
–
D
–
C
8
11
–
–
D
12
15
10
–
E
15
18
13
5
CS/BIO 271 - Introduction to Bioinformatics
6
UPGMA
Similar to average-link clustering
Merge the closest two groups
• Replace the distances for the new, merged group
with the average of the distance for the previous
two groups
Repeat until all species are joined
CS/BIO 271 - Introduction to Bioinformatics
7
UPGMA Step 1
Species
B
A
9
B
–
C
–
D
–
C
8
11
–
–
D
12
15
10
–
E
15
18
13
5
Merge D & E
D
Species
B
A
9
B
–
C
–
C
8
11
–
DE
CS/BIO 271 - Introduction to Bioinformatics
E
13.5 16.5 11.5
8
UPGMA Step 2
Species
B
A
9
B
–
C
–
C
8
11
–
DE
Merge A & C
A
C
D
E
13.5 16.5 11.5
Species
AC
DE
CS/BIO 271 - Introduction to Bioinformatics
B
10
AC
–
16.5 12.5
9
UPGMA Steps 3 & 4
Species
AC
DE
B
10
AC
–
Merge B & AC
A
C
B
D
E
16.5 12.5
Merge ABC & DE
A
C
B
D
E
(((A,C)B)(D,E))
CS/BIO 271 - Introduction to Bioinformatics
10
Parsimony approaches
Belong to the broader class of character based
methods of phylogenetics
Emphasize simpler, and thus more likely
evolutionary pathways
I: GCGGACG
II: GTGGACG
A
(C or T)
C
I
(C or T)
T
II
CS/BIO 271 - Introduction to Bioinformatics
C
I
T
II
11
Informative and uninformative sites
Position
Seq 1
2
3
4
5
6
1
G
G
G
G
G
G
2
G
G
G
A
G
T
3
G
G
A
T
A
G
4
G
A
T
C
A
T
For positions 5 & 6, it is
possible to select more
parsimonious trees –
those that invoke less
substitutions.
CS/BIO 271 - Introduction to Bioinformatics
12
Parsimony methods
Enumerate all possible trees
Note the number of substitutions events
invoked by each possible tree
• Can be weighted by transition/transversion
probabilities, etc.
Select the most parsimonious
CS/BIO 271 - Introduction to Bioinformatics
13
Branch and Bound methods
Key problem – number of possible trees grows
enormous as the number of species gets large
Branch and bound – a technique that allows
large numbers of candidate trees to be rapidly
disregarded
Requires a “good guess” at the cost of the best
tree
CS/BIO 271 - Introduction to Bioinformatics
14
Branch and Bound for TSP
Find a minimum cost
round-trip path that
visits each intermediate
city exactly once
NP-complete
Greedy approach:
A,G,E,F,B,D,C,A
= 251
CS/BIO 271 - Introduction to Bioinformatics
A
93
20
D
82
B
59
31
57
12
G
17
C
46
82
35
E
68
F
15
15
Search all possible paths
A
93
20
All paths
D
82
B
59
31
57
12
G
17
C
46
82
35
AG (20)
E
68
F
AB (46)
AC (93)
15
AGF (88)
AGFB
AGFE
Best estimate: 251
CS/BIO 271 - Introduction to Bioinformatics
AGE (55)
AGFC
ACB (175)
ACD
ACF
ACBE (257)
16
Parsimony – Branch and Bound
Use the UPGMA tree for an initial best estimate
of the minimum cost (most parsimonious) tree
Use branch and bound to explore all feasible
trees
Replace the best estimate as better trees are
found
Choose the most parsimonious
CS/BIO 271 - Introduction to Bioinformatics
17
Parsimony example
Position
Seq 1
2
3
4
5
6
1
G
G
G
G
G
G
2
G
G
G
A
G
T
3
G
G
A
T
A
G
4
G
A
T
C
A
T
Position 5:
All trees
(1,2) [0]
(1,3) [1]
(1,4) [1]
Etc.
CS/BIO 271 - Introduction to Bioinformatics
18