Transcript 1 2 , 3 4 5
Construction of Phylogenetic Trees
Walter M. Fitch and Emanuel Margoliash
Science, New Series, Volume 155, Issue 3760(Jan. 20, 1967), 279-284
Speaker : Fang-Ling Lin
Advisor : Prof. R.C. T. Lee
National Chi-Nan University
1
Outline
Basic nouns
Construct phylogenetic tree
Analyze the phylogenetic tree
Reconstruction of the ancestral cytochrome c
amino acid sequences.
2
Introduction
Biochemists have attempted to use
quantitative estimates of variance between
substances obtained from different species to
construct phylogenetic trees.
These methods have not been completely
satisfactory because
1. restricted
2. accuracy
3. mathematical
3
What is cytochrome c?
Cytochrome c is a protein that participates in
the metabolism of the mitochondrion .
It will move from the mitochondrion to the
cytoplasm and the cell will die.
4
Determining the Mutation Distance
The mutation distance : The minimal number
of nucleotides that would need to be altered in
order for the gene for one cytochrome to code
for the other.
ACTGAT
A C T G AT T C T - AT C
TCTATC
5
Problem
Given:
Output: phylogenetic tree
6
The construction of the tree
Assume there are proteins, A, B and C, and
their mutation distances.
A
B
B
24
C
28
32
There are two fundamental problems:
1. Which pair does one join together first?
2. What are the lengths of edges a, b, and c?
7
Which pair does one join together first ?
It is simply by choosing the pair with the
smallest mutation distance.
A
B
B
24
C
28
32
A
B
C
8
What are the lengths of legs a, b, and c?
A
B
B
24
C
28
32
a+b=24
a+c=28
b+c=32
c
a
b
A
B
C
a=10
b=14
c=18
9
When information from more than three
proteins is utilized
When information from more than three
proteins is utilized, the basic procedure is the
same.
One then simply joins two subsets to create a
single subset.
Until all proteins are members of a single
subset.
10
Example: 5 proteins
1
1,2
1
2
3
4
5
0
1
13
17
16
0
12
16
15
0
10
8
3
0
1
4
2
3
4
5
0
c=14.33
a=1
b=0
1
2
3,4,5
1,2
0
3
4
5
(13+12)/2 (17+16)/2 (16+15)/2
=12.5
=16.5
=15.5
0
10
8
0
1
5
0
a+b=1
a+c=(13+17+16)/3=15.33
b+c=(12+16+15)/3=14.33
a=1
b=0
c=14.33
11
Example: 5 proteins
1,2
1,2
3
4,5
0
12.5
(16.5+15.5)/2
=16
0
(10+8)/2
=9
3
1
1
0
2 , 3
4,5
0
c=12
a+b=1
a+c=(16.5+10)/2=13.25
b+c=(15.5+8)/2=11.75
a=1.25
4
b=-0.25
5
a=1.25
b=-0.25
c=12
12
Example: 5 proteins
1,2
1,2
3,4,5
0
(12.5+16)/2
=14.25
3,4,5
0
a+b=9
a+c=12.5
b+c=16
c=9.75
b=6.25
a=2.75
1
1
1.25
0
2
3
a=2.75
b=6.25
c=9.75
4
-0.25
5
13
Example: 5 proteins
1,2
1,2
3,4,5
0
14.25
3,4,5
0
((x+1.25)+(x-0.25))/2=6.25
x=5.75
c=9.75
y=9.25
x=5.75
2.75
1
1
b=6.25
1.25
0
2
3
((y+1)+(y+0))/2=9.75
y=9.25
4
-0.25
5
14
Testing Alternative Trees
In this method, the output is generated by input,
and the results are the same by using the same
input every time.
Since a particular assignment of species to A
and B subsets defines a tree, thus different
assignments of species to A and B produce
different trees. Check this out.
Fig. 1 is the best of 40 phylogenetic trees.
15
Phylogenetic Tree of 20 species
•Back 1
•Back 2
Fig.1
16
Reconstructed distances
j
reconstruct value
i
original
input
Values in the upper right half of the table are
reconstructed distances found by summing the leg
lengths in Fig.1.
17
Standard deviation
the percentage of change from the input data
standard deviation:
summed over all values of i<j
18
The statistically optimal tree
In testing phylogenetic alternatives, one is
seeking to minimize the percent “standard
deviation.”
Fig.1 has a percent “standard deviation” of 8.7,
the lowest of the 40 alternatives so far tested.
The percent “standard deviation” for the initial
tree was 12.3.
19
The statistically optimal tree
20
Fig.1 is remarkably like that constructed in
accord with classical zoological comparisons.
Almost all the alternative phylogenetic
schemes tested involved rearrangements with
the groups birds (turkey, chicken) and
nonprimate mammals (cow, sheep, pig).
21
Three noticeable deviations
Birds of flight (Neognathae) and penguin
(Impennae)
Kangaroo v.s. nonprimate mammals and
placental mammals v.s. marsupials
The turtle appears more closely associated
with the birds than to its fellow reptile the
rattlesnake.
Fig.1
22
Indeed, from any phylogenetic ancestor, today’s
descendants are equidistant with respect to time but
not equidistant genetically.
The method indicates those lines in which the gene
has undergone the more rapid changes.
For example, The mutation distance between
mammals and primates is 7.5 and that between
mammals and non-primates is 5.8. The change in the
cytochrome c gene has been much more rapid in the
descent of the primates than in that of the other
mammals.
Fig.1
23
Reconstruction of the ancestral cytochrome c
amino acid sequences.
The procedure is dependent upon the
phylogenetic tree on which these sequence
data are arranged.
24
Amino acid No.
Ancestral
Mammal
Ancestral
Primate
Monkey
Man
---------Kangaroo
---------Rabbit
---------Dog
Ancestral
Ungulate
Pig
Ancestral
Perissodactyl
Donkey
Horse
17 18 21 39 41 50 52 53 56 64 66 68 89 94 95 98 109
V Q L H U P O S A E Y A L
W M
W M
W M
V Q
V Q
V Q
V Q
V Q
V Q
S
S
S
L
L
L
L
L
L
H
H
H
H
N
H
H
H
H
U
U
U
U
W
U
U
U
U
P
P
P
P
P
P
V
P
P
O
O
F
F
F
F
F
O
F
S
S
S
S
S
S
S
S
S
L
L
L
A
A
A
A
A
A
E
E
W
E
W
E
E
E
E
Y A
Y A
Y A
Y A
Y A
Y A
Y A
Y G
Y G
V
V
V
L
L
L
L
L
L
I G L N
I
I
I
I
I
I
I
I
E
G
G
G
Y
Y
Y
A
Y
Y
L
L
L
L
L
L
L
L
L
N
N
N
N
N
N
N
N
I
V Q L H U P F S A E Y G L I Y Q N
V Q L H U P F S A E Y G L I Y Q N
V Q L H U P F S A E I G L I E Q N
V Q L H U P F S A E I G L I E Q N
V Q L H U P F E A E I G L I E Q N
25
There is presently no detectable relationship
between the primary structures of cytochrome
c and those of hemoglobins. The
reconstruction and comparison of the ancestral
amino acid sequences may reval a homology
that cannot be detected in present-day proteins.
The employment of such ancestral sequences
may be generally useful for detecting common
ancestry not otherwise observable.
26
Thank you !
27