Lecture-TreeOfLife

Download Report

Transcript Lecture-TreeOfLife

In brief
•
•
•
•
•
•
Vertical vs. Horizontal
Homologous vs. Unequal
Prokaryotes vs. Eukaryotes
Mechanisms and Vectors
Impact on Tree of Life
Implications for prokaryotic species
Possible mechanisms for HT in
Drosophila
From Heredity (2008) 100, 545–554
EVOLUTION: Genome Data Shake Tree of Life
E Pennisi - Science, 1998 - sciencemag.org
The ring of life provides evidence for a genome fusion origin of
eukaryotes
MC Rivera, JA Lake - Nature, 2004
The net of life: reconstructing the microbial phylogenetic network
V Kunin, L Goldovsky, N Darzentas, CA … - Genome Research 2005
The tree of one percent
T Dagan, W Martin - Genome biology, 2006
Uprooting the tree of life
WF Doolittle - Evolution: a Scientific American reader, 2006
Clusters of Orthologous Groups (COGs)
Puigbo et al.
• 6901 ML trees
• 100 taxa total
• Objective – compare
topological distance
between trees
• New metric called IS
(inconsistency score) =
fraction of the time
splits in a tree are
found all trees
Many genes are not found in all taxa
Define 102 NUTs or
“nearly universal trees”
that include 90% of the
prokaryotes under
comparison.
Mostly translation and
core transcription related
J Biol. 2009;8(6):59.
The big divide?
• Look for evidence of HGT between bacteria
and archaea
• 56% of NUTs separated the groups perfectly
• 44% show at least on HGT
– 13% from archaea to bacteria
– 23% from bacteria to archaea
– 8% both directions
The network of similarities among the nearly universal trees (NUTs). (a) Each node
(green dot) denotes a NUT, and nodes are connected by edges if the similarity between
the respective edges exceeds the indicated threshold. (b) The connectivity of 102 NUTs
and the 14 1:1 NUTs depending on the topological similarity threshold.
The supernetwork of the NUTs. For spcies abbreviations see Additional File 1.
Puigbò et al. Journal of Biology 2009 8:59 doi:10.1186/jbiol159
Network representation of the 6,901 trees of the forest of life. The 102 NUTs are shown as
red circles in the middle. The NUTs are connected to trees with similar topologies: trees
with at least 50% of similarity with at least one NUT (P-value < 0.05) are shown as purple
circles and connected to the NUTs. The rest of the trees are shown as green circles.
Puigbò et al. Journal of Biology 2009 8:59 doi:10.1186/jbiol159
Similarity of the trees in the forest of life to the NUTs. (a) For each of the 102 NUTs,
the breakdown of the rest of the trees in the forest by percent similarity is shown. (b)
The same breakdown for 102 random trees generated from the NUTs.
Puigbò et al. Journal of Biology 2009 8:59 doi:10.1186/jbiol159
Proc Natl Acad Sci U S A. 2005 Oct 4;102(40):14332-7.
Highways of obligate gene transfer within and among phyla and divisions of prokaryotes, based on analysis
of the 22,348 protein trees for which a minimal edit path could be resolved
Beiko R G et al. PNAS 2005;102:14332-14337
©2005 by National Academy of Sciences
Ratio of observed to expected discordant bipartitions among proteins in major TIGR role category
groupings
Beiko R G et al. PNAS 2005;102:14332-14337
©2005 by National Academy of Sciences
Fig. 1. Two methods for assessing LGT in bacterial genomes, applied to available quartets of closely related, fully sequenced bacterial taxa. The reference topology, based on
SSU rRNA, is shown in the upper left, with taxon names listed in the rows below. The yellow box contains the numbers of gene acquisitions in genomes A and B, as
determined by parsimony in comparisons of complete genome contents. The blue box contains the numbers of orthologous genes supporting a topology that conflicts with
the reference topology. "Interspecies" and "Intraspecies" comparisons represent quartets of taxa in which phylogenetic incongruence can be explained, respectively, by a
transfer from another species or from another strain of the same species. For intraspecies comparisons, numbers of acquired and lost genes were not calculated because of
uncertainty about the actual tree topology (nd, not determined). (B. aphidicola strains are entirely isolated in different hosts and were thus considered as different species
despite having a single name. In B. aphidicola, amounts of gene loss and gene gain are similar, suggesting that LGT is overestimated due to independent losses of genes.)
Fig. 2. Relative frequencies of the three categories of alignments, i.e., those supporting the reference phylogeny (SSU rRNA), those supporting an
alternate phylogeny (LGT), and those with no statistical support for any phylogeny. Points represent quartets of genomes for which orthologous
genes have been inferred, aligned, and evaluated at the nucleic acid sequences level based on the SH test implemented in Puzzle 5.1 (19). The
left part of the plot (in blue) represents the area where LGT predominates.
“THE” E. coli genome
Blattner et al., Science 5
September 1997 277: 14531462
Figure 1. The overall structure of the E. coli genome. The origin and terminus of replication are shown as green lines, with
blue arrows indicating replichores 1 and 2. A scale indicates the coordinates both in base pairs and in minutes (actually
centisomes, or 100 equal intervals of the DNA). The distribution of genes is depicted on two outer rings: The orange boxes
are genes located on the presented strand, and the yellow boxes are genes on the opposite strand. Red arrows show the
location and direction of transcription of rRNA genes, and tRNA genes are shown as green arrows. The next circle illustrates
the positions of REP sequences around the genome as radial tick marks. The central orange sunburst is a histogram of
inverse CAI (1 - CAI), in which long yellow rays represent clusters of low (<0.25) CAI. The CAI plot is enclosed by a ring
indicating similarities between previously described bacteriophage proteins and the proteins encoded by the complete
E. coli genome; the similarity is plotted as described in Fig. 3 for the complete genome comparisons.
Perna et al., Nature 409, 529533(25 January 2001)
Outer circle shows the distribution of islands: shared co-linear backbone (blue); position of EDL933-specific sequences (Oislands) (red); MG1655-specific sequences (K-islands) (green); O-islands and K-islands at the same locations in the backbone
(tan); hypervariable (purple). Second circle shows the G+C content calculated for each gene longer than 100 amino acids,
plotted around the mean value for the whole genome, colour-coded like outer circle. Third circle shows the GC skew for
third-codon position, calculated for each gene longer than 100 amino acids: positive values, lime; negative values, dark
green. Fourth circle gives the scale in base pairs. Fifth circle shows the distribution of the highly skewed octamer Chi
(GCTGGTGG), where bright blue and purple indicate the two DNA strands. The origin and terminus of replication, the
chromosomal inversion and the locations of the sequence gaps are indicated. Figure created by Genvision from DNASTAR.
Shared E. coli proteins
Welch R A et al. PNAS 2002;99:17020-17024
©2002 by National Academy of Sciences