Transcript Trees
Molecular
Phylogenetics
Dan Graur
1
Objectives of molecular
phylogenetics
• Reconstruct the correct evolutionary
relationships among biological entities
• Estimate the time of divergence between
biological entities
• Chronicle the sequence of events along
evolutionary lineages
2
Evolutionary relationships are
illustrated by means of a
phylogenetic tree or a
dendogram.
3
Ernst Heinrich Haeckel
1834-1919
4
July 2007
July 1837
5
November 1859
6
The routes of inheritance represent the passage of genes from
parents to offspring, and the branching pattern depicts a gene tree.
7
Different genes, however, may have different evolutionary
histories, i.e., different routes of inheritance.
8
The routes of inheritance are confined by reproductive barriers, i.e.,
gene flow occurs only within a species. A species tree is a
representation of splitting of species lineages.
9
Terminology
10
A phylogenetic tree or dendrogram is a
graph composed of nodes and branches, in
which only one branch connects any two
adjacent nodes.
11
Internal
External or Peripheral
Branch
12
13
Assumptions:
Bifurcation = Real
speciation event
Multifurcation = Lack
of resolution
14
Binary tree
15
Rooted and unrooted trees
16
How many unrooted topologies are here?
d
a
1
d
b
2
b
a
e
c
e
c
c
e
e
a
a
b
3
4
b
d
d
c
17
In an unrooted tree with four external
nodes, the internal branch is referred to
as the central branch.
18
Cladograms & Phylograms
(collectively Dendograms)
Bacterium 1
Bacterium 2
Bacterium 3
Eukaryote 1
Eukaryote 2
Cladograms show
branching order branch lengths are
meaningless
Eukaryote 3
Eukaryote 4
Bacterium 1
Bacterium 2
Bacterium 3
Eukaryote 1
Phylograms show
branch order and
branch lengths
Eukaryote 2
Eukaryote 3
Eukaryote 4
19
Unscaled phylogram
Scaled phylogram
20
21
The Newick format
In computer programs, trees are
represented in a linear form by a string of
nested parentheses, enclosing taxon names
(and possibly also branch lengths and
bootstrap values), and separated by
commas. This type of representation is
called the Newick format. The originator
of this format in mathematics was Arthur
Cayley.
23
The Newick format
The Newick format for phylogenetic trees was adopted on June 26,
1986 at an informal meeting at Newick's Lobster House in Dover,
New Hampshire. The Newick format currently serves as the de facto
standard for representing phylogenetic tree and is employed by
almost all phylogenetic software tools. Unfortunately, it has never
been described in a formal publication; the first time it is mentioned
in a publication is in 1992.
24
The Newick
format
In the Newick format,
the pattern of the
parentheses indicates the
topology of the tree by
having each pair of
parentheses enclose all
members of a
monophyletic group. A
phylogenetic tree in the
Newick format always
ends in a semicolon (;).
; 25
The Newick format
One can use the Newick format to write
down rooted trees, unrooted trees,
multifurcations, branch lengths, and
bootstrap values.
26
3 OTUs
1 unrooted tree = 3 rooted trees
27
4 OTUs
3 unrooted trees = 15 rooted trees
28
The number of possible bifurcating
rooted trees (NR) for n 2 OTUs
(2n 3)!
N =
R 2 n 2 (n 2)!
(2n 5)!
N =
U 2 n 3 (n 3)!
The number of possible bifurcating
unrooted trees (NU) for n 3 OTUs
29
Number of OTUs
Number of possible rooted tree
2
1
3
3
4
15
5
105
6
954
7
10,395
8
135,135
9
2,027,025
10
34,459,425
15
213,458,046,676,875
20
8,200,794,532,637,891,559,375
30
Evolution is an historical process.
Only one historical narrative is true.
From 8,200,794,532,637,891,559,375
possibilities, 1 possibility is true and
8,200,794,532,637,891,559,374 are
false.
Truth is one, falsehoods are
many.
31
How do we know which of the
8,200,794,532,637,891,559,375
trees is true?
32
We don’t, we infer
by using decision
criteria.
33
True and inferred trees
The sequence of speciation events that has led to
the formation of a group of OTUs is historically
unique. A tree representing the true evolutionary
history is called the true tree.
A tree that is obtained by using a certain set of
data and a certain method of tree reconstruction is
called an inferred tree.
An inferred tree may or may NOT be the true
34
tree.
ancestor
descendant 1
Cladogenesis =
the splitting of an
evolutionary
lineage into two
genetically
independent
lineages.
descendant 2
35
ancestor
descendant 1
Anagenesis =
changes
occurring along
an evolutionary
lineage.
descendant 2
36
In molecular
phylogenetics,
we assume that
species are only
created by
cladogenesis.
37
A gene tree may differ from a species tree
38
Gene trees and species trees
Gene tree
a
A
b
B
c
D
Species tree
It is often assumed that gene trees always
equal species trees. This may be not be
true.
39
Orthologs and paralogs
paralogous
orthologous
a
b* c
b* C*
orthologous
C* B
A*
A*
A mixture of
orthologs and
paralogs is sampled
Duplication yields 2
copies (paralogs) on the
same genome
Ancestral gene
40
41
Taxon (singular); Taxa (plural)
A taxon is a species or a group of species
that has been given a name, e.g., Homo
sapiens (modern humans), or Lepidoptera
(butterflies), or herbs.
There are codes of biological nomenclature
which seek to ensure that every taxon has a
single and stable name, and that every name
is used for only one taxon.
42
Clades*
• Strictly: A clade is a group of all the taxa that have been
derived from a common ancestor plus the common ancestor
itself.
• In molecular phylogenetics: A clade is a group of taxa under
study that share a common ancestor, which is not shared by
any other species outside the group.
43
*also: monophyletic groups, natural clades
Paraphyletic Taxa
• A taxon whose common
ancestor is shared by any
other taxon is called a
paraphyletic taxon or
an invalid taxon.
Reptiles are paraphyletic.
44
• A named taxon that lacks phylogenetic validity,
but is nonetheless used, is called a convenience
taxon.
Fish (Pisces)
“a convenience fish”
45
Sister Taxa
• If a clade is composed
of two taxa, these are
referred to as sister
taxa.
Birds and crocodiles are
sister taxa.
46
= clades
Phenotypic distance
47
Which of the following groups are not monophyletic?
E. coli
rat
mouse
baboon chimp
human
a. human, chimpanzee, baboon
b. mouse, chimpanzee, baboon
c. rat, mouse
d. human, chimpanzee, baboon, rat, mouse
e. E. coli, human, chimpanzee, baboon, rat, mouse
48
Which of the following groups are not monophyletic?
E. coli
rat
mouse
baboon chimp
human
a. human, chimpanzee, baboon
b. mouse, chimpanzee, baboon
c. rat, mouse
d. human, chimpanzee, baboon, rat, mouse
e. E. coli, human, chimpanzee, baboon, rat, mouse
49
50
A character
provides
information
about an
individual OTU.
A distance
represents a
quantitative
statement
concerning the
dissimilarity
between two
OTUs.
51
A character is a well-defined
feature that in a taxonomic unit
can assume one out of two or
more mutually exclusive
character states.
Mutually exclusive: If David is tall, David cannot be short.
52
53
54
Character
Continuous
Discrete
Multistate
Ordered
Polar
Binary
Unordered
Unpolar
Polar
Unpolar
55
A character is unordered if a change
from one character state to any other
character state can occur in one step.
56
A character is ordered if there exists a unique
symmetrical path of change from one character
state to another.
57
A character is polar if there exists a unique
asymmetrical (irreversible) path of change
from one character state to another.
Polar
58
In partially ordered characters the number of steps varies for the
different pairwise combinations of character states, but no definite
relationship exists between the number of steps and the character-state.
Amino-acid sites are partially ordered characters. An amino acid cannot
change into all other amino acids in a singe step, as sometimes 2 or 3
steps are required. For example,
a tyrosine may only change
into a leucine through an
intermediate state, i.e.,
phenylalanine or histidine.
59
The number of steps in partially ordered
characters is specified by a step matrix,
the elements of which indicate the
number of steps required between any
two character states
60
61
Assumptions about character evolution
Methods of phylogenetic reconstruction
require that we make explicit assumptions
about:
(1) the number of discrete steps required for
one character state to change into another.
(2) the probability with which such a change
may occur.
62
Temporal Polarity of Character States
Character states may be ranked by relative
antiquity into:
(1) primitive or ancestral (plesiomorphy)
(2) derived or novel (apomorphy)
63
Taxonomic Distribution of Character States
A primitive state that is shared by several taxa is a
symplesiomorphy.
A derived state that is shared by several taxa is a
synapomorphy.
sympathy
synapse
syllable
system
A derived character state unique to a particular taxon is an
autapomorphy.
A character state that is shared by several taxa due to
convergence, parallelism and reversals, rather than due to
common descent, is a homoplasy.
64
homoplasy
apomorphy
synapomorphy
(autapomorphy)
D
symplesiomorphy
C
C
B
A
A
B
A
C
A
A
A
plesiomorphy
A
65
66
Distance Data
67
68
Most molecular data yield character
states that are subsequently
converted into distances.
69
Some molecular data can only be
expressed as distances.
70
71
72
73
+
74