Phylogenetic trees

Download Report

Transcript Phylogenetic trees

Phylogenies
Preliminaries
Distance-based methods
Parsimony Methods
Phylogenetic Trees
 Hypothesis about the relationship between
organisms
 Can be rooted or unrooted
A
B
B
C
D
E
Time
A
E
C
D
Root
CS/BIO 271 - Introduction to Bioinformatics
2
Tree proliferation

2n  3!
N R  n2
2 n  2 !
NU

2n  5!
 n 3
2 n  3!
Species Number of Rooted Trees
Number of Unrooted Trees
2
1
1
3
3
1
4
15
3
5
105
15
6
34,459,425
2,027,025
7
213,458,046,767,875
7,905,853,580,625
8
8,200,794,532,637,891,559,375
221,643,095,476,699,771,875
CS/BIO 271 - Introduction to Bioinformatics
3
Molecular phylogenetics
 Specific genomic
sequence variations
(alleles) are much
more reliable than
phenotypic
characteristics
 More than one gene
should be considered
CS/BIO 271 - Introduction to Bioinformatics
4
An ongoing didactic
 Pheneticists tend to prefer distance based
metrics, as they emphasize relationships among
data sets, rather than the paths they have taken
to arrive at their current states.
 Cladists are generally more interested in
evolutionary pathways, and tend to prefer more
evolutionarily based approaches such as
maximum parsimony.
CS/BIO 271 - Introduction to Bioinformatics
5
Distance matrix methods
Species
B
A
9
B
–
C
–
D
–
C
8
11
–
–
D
12
15
10
–
E
15
18
13
5
CS/BIO 271 - Introduction to Bioinformatics
6
UPGMA
 Similar to average-link clustering
 Merge the closest two groups
• Replace the distances for the new, merged group
with the average of the distance for the previous
two groups
 Repeat until all species are joined
CS/BIO 271 - Introduction to Bioinformatics
7
UPGMA Step 1
Species
B
A
9
B
–
C
–
D
–
C
8
11
–
–
D
12
15
10
–
E
15
18
13
5
Merge D & E
D
Species
B
A
9
B
–
C
–
C
8
11
–
DE
CS/BIO 271 - Introduction to Bioinformatics
E
13.5 16.5 11.5
8
UPGMA Step 2
Species
B
A
9
B
–
C
–
C
8
11
–
DE
Merge A & C
A
C
D
E
13.5 16.5 11.5
Species
AC
DE
CS/BIO 271 - Introduction to Bioinformatics
B
10
AC
–
16.5 12.5
9
UPGMA Steps 3 & 4
Species
AC
DE
B
10
AC
–
Merge B & AC
A
C
B
D
E
16.5 12.5
Merge ABC & DE
A
C
B
D
E
(((A,C)B)(D,E))
CS/BIO 271 - Introduction to Bioinformatics
10
Parsimony approaches
 Belong to the broader class of character based
methods of phylogenetics
 Emphasize simpler, and thus more likely
evolutionary pathways
I: GCGGACG
II: GTGGACG
A
(C or T)
C
I
(C or T)
T
II
CS/BIO 271 - Introduction to Bioinformatics
C
I
T
II
11
Informative and uninformative sites
Position
Seq 1
2
3
4
5
6
1
G
G
G
G
G
G
2
G
G
G
A
G
T
3
G
G
A
T
A
G
4
G
A
T
C
A
T
For positions 5 & 6, it is
possible to select more
parsimonious trees –
those that invoke less
substitutions.
CS/BIO 271 - Introduction to Bioinformatics
12
Parsimony methods
 Enumerate all possible trees
 Note the number of substitutions events
invoked by each possible tree
• Can be weighted by transition/transversion
probabilities, etc.
 Select the most parsimonious
CS/BIO 271 - Introduction to Bioinformatics
13
Branch and Bound methods
 Key problem – number of possible trees grows
enormous as the number of species gets large
 Branch and bound – a technique that allows
large numbers of candidate trees to be rapidly
disregarded
 Requires a “good guess” at the cost of the best
tree
CS/BIO 271 - Introduction to Bioinformatics
14
Branch and Bound for TSP
 Find a minimum cost
round-trip path that
visits each intermediate
city exactly once
 NP-complete
 Greedy approach:
A,G,E,F,B,D,C,A
= 251
CS/BIO 271 - Introduction to Bioinformatics
A
93
20
D
82
B
59
31
57
12
G
17
C
46
82
35
E
68
F
15
15
Search all possible paths
A
93
20
All paths
D
82
B
59
31
57
12
G
17
C
46
82
35
AG (20)
E
68
F
AB (46)
AC (93)
15
AGF (88)
AGFB
AGFE
Best estimate: 251
CS/BIO 271 - Introduction to Bioinformatics
AGE (55)
AGFC
ACB (175)
ACD
ACF
ACBE (257)

16
Parsimony – Branch and Bound
 Use the UPGMA tree for an initial best estimate
of the minimum cost (most parsimonious) tree
 Use branch and bound to explore all feasible
trees
 Replace the best estimate as better trees are
found
 Choose the most parsimonious
CS/BIO 271 - Introduction to Bioinformatics
17
Parsimony example
Position
Seq 1
2
3
4
5
6
1
G
G
G
G
G
G
2
G
G
G
A
G
T
3
G
G
A
T
A
G
4
G
A
T
C
A
T
Position 5:
All trees
(1,2) [0]
(1,3) [1]
(1,4) [1]
Etc.
CS/BIO 271 - Introduction to Bioinformatics
18