Transcript Document

Example of bipartition
analysis for five genomes
of photosynthetic bacteria
Bipartitions supported by
genes from chlorophyll
biosynthesis pathway
total 10 bipartitions
R: Rhodobacter capsulatus,
H: Heliobacillus mobilis,
S: Synechocystis sp.,
Ct: Chlorobium tepidum,
Ca: Chloroflexus aurantiacus
R
H
Ca
Plurality
Ct
Ca
S
R
H
Chl. Biosynth.
Ct
S
Zhaxybayeva, Hamel, Raymond, and Gogarten, Genome Biology 2004, 5: R20
(188 gene families)
Xiong et al. Science, 2000 289:1724-30
(extended datasets)
Phylogenetic Analyses of Genes from
chlorophyll biosynthesis pathway
Zhaxybayeva, Hamel, Raymond, and Gogarten, Genome Biology 2004, 5: R20
R: Rhodobacter capsulatus, H: Heliobacillus
mobilis, S: Synechocystis sp., Ct: Chlorobium
tepidum, Ca: Chloroflexus aurantiacus
PROBLEMS WITH BIPARTITIONS
• No easy way to incorporate gene families
that are not represented in all genomes.
• The more sequences are added, the
shorter the internal branches become, and
the lower is the bootstrap support for the
individual bipartitions.
• A single misplaced sequence can destroy
all bipartitions.
Bootstrap support values for embedded quartets
+
: tree calculated from one pseudosample generated by bootstraping
from an alignment of one gene family
present in 11 genomes
: embedded quartet for genomes
1, 4, 9, and 10 .
This bootstrap sample supports the
topology ((1,4),9,10).
1
4

9
10
Quartet spectral analyses of genomes iterates
over three loops:
Repeat for all bootstrap samples.
Repeat for all possible embedded quartets.
Repeat for all gene families.
1
10
9
4
1
9
10
4
Iterating over Bootstrap Samples
This gene family for the
quartet of species A, B, C, D
Supports the Topology
((A, D), B, C) with 70%
bootstrap support
Bootstrap support values for embedded quartets
+
: tree calculated from one pseudosample generated by bootstraping
from an alignment of one gene family
present in 11 genomes
: embedded quartet for genomes
1, 4, 9, and 10 .
This bootstrap sample supports the
topology ((1,4),9,10).
1
4

9
10
Quartet spectral analyses of genomes iterates
over three loops:
Repeat for all bootstrap samples.
Repeat for all possible embedded quartets.
Repeat for all gene families.
1
10
9
4
1
9
10
4
Illustration of one component of a quartet spectral analyses
Summary of phylogenetic information for one genome quartet for all gene
families
Total number of gene families
containing the species quartet
Number of gene families
supporting the same topology
as the plurality
(colored according to bootstrap
support level)
Number of gene families
supporting one of the two
alternative quartet topologies
Quartet Spectrum
of 11 cyanobacterial
genomes
1128 datasets
from relaxed
core (core
datasets +
datasets with
one or two taxa
missing)
Number of datasets
330 possible
quartets
685
datasets
show
conflicts
with
plurality
quartets
PLURALITY SIGNAL
Gloeobacter
marine Synechococcus
3Prochlorococcus
mS
N
3P
2Prochlorococcus
2P
A
1Prochlorococcus
1P
G
Nostoc
Th
Anabaena
Trichodesmium
S
Tr
C
Crocosphaera
Synechocystis
Thermosynechococcus
Distribution of 1128 datasets in the relaxed core
123
212
INFORMATION STORAGE
AND PROCESSING
CELLULAR PROCESSES
AND SIGNALING
METABOLISM
160
192
POORLY
CHARACTERIZED
NOT PRESENT IN COGS
441
Distribution of 624 datasets conflicting the plurality
signal
70
Conflicts with
plurality signal are
observed in sets of
orthologs across all
functional
categories, including
genes involved in
translation and
transcription
INFORMATION STORAGE
AND PROCESSING
CELLULAR PROCESSES
AND SIGNALING
METABOLISM
127
82
96
POORLY
CHARACTERIZED
NOT PRESENT IN COGS
249
624/1128 ≈ 55%
Genes with orthologs outside the cyanobacterial phylum:
Distribution among Functional Categories
(using COG db, release of March 2003)
Example of interphylum transfer:
threonyl tRNA
synthetase
Species evolution versus plurality consensus
 In case of the marine Synecchococcus and Prochlorococcus spp. the plurality
consensus is unlikely to reflect organismal history.
 This is probably due to frequent gene transfer mediated by phages
e.g.:
 These conflicting observations are not limited to prokaryotes. In incipient species of Darwin’s finches frequent
introgression can make some individuals characterized
by morphology and mating behavior as belonging to the
same species genetically more similar to a sister species
(Grant et al. 2004 “Convergent evolution of Darwin's
finches caused by introgressive hybridization and
selection” Evolution Int J Org Evolution 58, 1588-1599).
The Coral of Life (Darwin)
Coalescence – the
process of tracing
lineages backwards
in time to their
common ancestors.
Every two extant
lineages coalesce
to their most recent
common ancestor.
Eventually, all
lineages coalesce
to the cenancestor.
t/2
(Kingman,
1982)
Illustration is from J. Felsenstein, “Inferring Phylogenies”, Sinauer, 2003
Coalescence of ORGANISMAL and MOLECULAR Lineages
Time
•20 lineages
•One extinction and one speciation
event per generation
RESULTS:
•One horizontal transfer event once in
5 generations (I.e., speciation events)
•Most recent common ancestors are different for organismal and
molecular phylogenies
RED: organismal lineages (no HGT)
BLUE: molecular lineages (with HGT)
GRAY: extinct lineages
•Different coalescence times
•Long coalescence time for the last two lineages
Y chromosome
Adam
Mitochondrial
Eve
Lived
approximately
50,000 years ago
Lived
166,000-249,000
years ago
Thomson, R. et al. (2000)
Proc Natl Acad Sci U S A 97,
7360-5
Cann, R.L. et al. (1987)
Nature 325, 31-6
Vigilant, L. et al. (1991)
Science 253, 1503-7
Underhill, P.A. et al. (2000)
Nat Genet 26, 358-61
Albrecht Dürer, The Fall of Man, 1504
Adam and Eve never met 
The same is true for ancestral rRNAs, EF, ATPases!
EXTANT LINEAGES FOR THE SIMULATIONS OF 50 LINEAGES
log (number of surviving lineages)
Lineages Through Time Plot
10 simulations of organismal evolution assuming
a constant number of species (200) throughout
the simulation;
1 speciation and 1 extinction per time step.
(green O)
25 gene histories simulated
for each organismal history assuming
1 HGT per 10 speciation events (red x)
green: organismal lineages ;
red: molecular lineages (with gene transfer)
The deviation from the “long
branches at the base” pattern
could be due to
• under sampling
• an actual radiation
• due to an invention that was
not transferred
• following a mass extinction
Bacterial 16SrRNA based phylogeny
(from P. D. Schloss and J. Handelsman,
Microbiology and Molecular Biology Reviews,
December 2004.)