A comprehensive investigation of ribosomal genes in complete

Download Report

Transcript A comprehensive investigation of ribosomal genes in complete

Comparative analysis of ribosomal proteins in complete
genomes: ribosome “striptease” in Archaea
Odile Lecompte, Raymond Ripp, Jean-Claude Thierry, Dino Moras and Olivier Poch
Laboratoire de Biologie et Génomique Structurales, Institut de Génétique et de Biologie Moléculaire et Cellulaire (CNRS, INSERM, ULP), BP163, 67404 Illkirch Cedex, France
Abstract
A comprehensive investigation of ribosomal genes in complete genomes from 66 different species
allows us to address the distribution of r-proteins between and within the three primary domains.
34 r-protein families are represented in all domains but 33 families are specific to Archaea and
Eucarya, providing evidence for specialisation at an early stage of evolution between the bacterial
lineage and the lineage leading to archaea and eukaryotes. With only one specific r-protein, the
archaeal ribosome appears to be a small-scale model of the eukaryotic one in term of protein
composition. However, the mechanism of evolution of the protein component of the ribosome
appears dramatically different in Archaea. In Bacteria and Eucarya, a restricted number of
ribosomal genes can be lost with a bias toward losses in intracellular pathogens. In Archaea,
losses implicate 15% of the ribosomal genes revealing an unexpected plasticity of the translation
apparatus and the pattern of gene losses indicates a progressive elimination of ribosomal genes in
the course of archaeal evolution. This first documented case of reductive evolution at the domain
scale provides a new framework for discussing the shape of the universal tree of life and the
selective forces directing the evolution of prokaryotes.
Genes often missed
during annotation
process
Ribosomal gene detection : cross-validation needed !
Small size and biased composition of r-proteins
An initial set of ribosomal proteins classified into 102 families was
obtained at http://www.expasy.ch/cgi-bin/lists?ribosomp.txt. For each
family, representatives of various lineages across Bacteria, Archaea and
Eucarya were used as probes and systematically compared to a nonredundant protein database consisting of SwissProt, SpTrEMBL and
SpTrEMBLNEW using the BlastP program (1) with a cut-off of E<0.001.
The results of the BlastP comparison were cross-validated by a TBlastN
search against a complete genome database including 66 different
species. The putative new gene sequences detected by the TBlastN
searches were examined in the light of their genomic context to eliminate
false-positives “hits”. For each r-protein family, the likely r-protein
sequences obtained by the BlastP and TBlastN searches were included in
a multiple alignment constructed by MAFFT (2). All alignments were
refined by RASCAL (3) and their quality assessed by NorMD (4). These
alignments were manually examined to remove false-positives observed
in some ribosomal protein families, in particular those containing
ubiquitous RNA-binding domains.
BlastP Hit between RL40_METJA (Query) and RL40_HUMAN
>SW:RL40_HUMAN P14793 60S RIBOSOMAL PROTEIN L40 (CEP52). 10/2001
Length = 52
Score = 31.6 bits (70), Expect = 1.8
Identities = 18/34 (52%), Positives = 20/34 (57%), Gaps = 3/34 (8%)
Query: 13 KKICMRCNARNPWRATKCR--KCGY-KGLRPKAK 43
K IC +C AR
RA CR KCG+
LRPK K
Sbjct: 17 KMICRKCYARLHPRAVNCRKKKCGHTNNLRPKKK 50
Difficulty of protein
detection by
similarity search
Protocol of ribosomal gene detection
R-protein families
Complete genomes
102 r-protein families
Genomic context analysis
Several representatives for each protein family
Pa, Ph, Pf, Mj
45 Bacteria
Ap
14 Archaea
Proteins
66
Complete
complete
genomes
L14E
SECY
ADK
Hyp
GATA
L34E
CMK
L14E
L14E
...
L19E
L18P
S5P
L30P
L15P
SECY
ADK
Hyp
L34E
CMK
Ss, St
...
L19E
L18P
S5P
L30P
L15P
SECY
ADK
Hyp
L34E
CMK
L14E
L19E
L18P
S5P
L30P
L15P
SECY
ADK
Hyp
L34E
CMK
L14E
GATA
L34E
CMK
L14E
Py
TBlastN
CMK
Mt
Mk
BlastP
L34E
TRUB
TRUBa
TRUBb
TRUB
Multiple alignment of complete sequences
7 Eucarya
Homology Detection Analysis))
Creation of 24 missed genes
Protein detected by :
100% of the family representatives in
>50% of the family representatives in
<50% of the family representatives in
0% of the family representatives in
0% of the family representatives in
(gene missed during annotation
both blastp and tblastn
blastp
blastp
both blastp and tblastn
blastp but detected by tblastn
process)
• Coherence of the protein family
• Elimination of false-positives
• Correction of protein sequences
Validation of protein sequences for each family
All the alignments are available at
http://www-igbmc.u-strasbg.fr/BioInfo/Rproteins
A complex Last Universal Common Ancestor ?
Interdomain distribution
Ribosomal protein losses in each of the three domains
Animals
Eucarya: 78 (32;46) Archaea: 68 (28;40)
AE: 33
(13;20)
E: 11
(4;7)
BE: 0
BAE: 34
(15;19)
Bacteria
Bacteria
A: 1
(0;1)
Archaea
1
23
Fungi
Eucarya
11
Archaeoglobus
Gram positives
33
Proteobacteria
Cyanobacteria
34
B: 23
(8;15)
Plants
Ciliates*
Thermoplasma
Methanococcus
Pyrococcus
Pyrobaculum
Chlamydia
Halobacterium
Methanobacterium
Spirochaetes
BA: 0
Eucarya
Archaea
Methanopyrus
Flagellates*
Trichomonads*
Sulfolobus
Deinococcus
Aeropyrum
Thermotoga
Bacteria: 57 (23;34)
Diplomonads*
Microsporidia
Aquifex
• Prevalence of r-proteins within the universal pool that may be present in the last universal
common ancestor (LUCA)
• specialization of bacterial versus archaeal/eukaryotic ribosomes
• the majority of archeal and eucaryotic r-proteins appears before the split between Archaea
and Eucarya, suggesting a complex cenancestor
S22p
S1p
L25p
S21p L30p
L38e
L13e
S25e
S26e
S30e
S21e
L35ae
L14e L34e
L30e
LXa
L28e
Full circles indicate proteins absent in all
complete genomes investigated in the
indicated taxon. Empty circles stand for
proteins absent in some complete
genomes of the indicated taxon
Localisation in the 3D structures
Bacteria-specific proteins (colored in different shades of red) are
preferentially located at the periphery of the ribosome
S6p
S18p
S2p
S11p
S15p
Progressive elimination of 10
r-proteins (15%) in the course of
archaeal evolution
S7p
S8p
S17p
S9p Thx
First example of reductive evolution
at domain-scale
S20p
S13p
S16p
S19p
S3p
S5p
S10p
S12p
S4p
S14p
the 30S ribosomal subunit of Thermus thermophilus (5)
(back side)
L5p
L25p L30p
L16p
L21p
L11p
L20p
Reductive evolution as a general
trend in Archaea ? In Procaryotes ?
L18p
L27p
L35p
A complex Last Universal Common
Ancestor (LUCA) ?
L33p
L15p
L4p/L4e
L28p
L6p
Which evolutionary scenario ?
L9p
L31p
L2p
L24p
L36p
L13p
L3p
E
B
A
A
B
E
i
E
A
B
o
L29p
References:
L14p
L19p
L17p
L32p
L23p
L22p L34p
the 50S ribosomal subunit of Deinococcus radodurans (6)
(crown view rotated by 180°)
1
2
3
4
5
6
Simple ancestor(s)
Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389-3402.
Katoh,K., Misawa,K., Kuma,K. and Miyata,T. (2002) Nucleic Acids Res., 30, 3059-3066.
Bacterial
Thompson,J.D., Thierry,J.C., Poch,O. (2003) Bioinformatics, 19, 1155-61.
Thompson,J.D., Plewniak,F., Ripp,R., Thierry,J.C. and Poch,O. (2001) J. Mol. Biol., 314, 937-951.
Wimberly,B.T., Brodersen,D.E., Clemons,W.M., Jr., Morgan-Warren,R.J., Carter,A.P., Vonrhein,C., Hartsch,T. and Ramakrishnan,V. (2000) Nature, 407, 327-339.
Harms,J., Schluenzen,F., Zarivach,R., Bashan,A., Gat,S., Agmon,I., Bartels,H., Franceschi,F. and Yonath,A. (2001) Cell, 107, 679-688.
rooting
Simple ancestor(s)
Complex ancestor(s)
Symbiosis
Eucarya rooting