of the number of ISs / Genome
Download
Report
Transcript of the number of ISs / Genome
Causes of insertion sequences abundance
in prokaryotic genomes?
A problem of size
Marie Touchon
E.P.C Rocha
Atelier de BioInformatique, Université Pierre et Marie Curie, Paris
Unité Génétique des Génomes Bactériens, Institut Pasteur, Paris
[email protected]
IS elements :
the simplest form of transposable elements
- 700 to 2500 bp
- coding only the information allowing their mobility
ability to generate mutations :
- by insertion within genes
- by activate genes on insertion upstream
- to generate extensive DNA rearrangements
have been found to shuttle the transfer of adaptive traits such as :
- antibiotic resistance
- virulence
- new metabolic capabilities
Their exact nature is still debated : Selfish/Advantageous?
- genomic parasites
- beneficial agents
Causes of insertion sequences abundance
in prokaryotic genome ?
Reasons largely unknown and widely speculated
Hypotheses :
- IS family specificity
- Genome size
- Frequency of horizontal gene transfer
- Pathogenicity
- Type of ecological associations
- Human sedentarisation
The current availability of hundreds of genomes renders testable many of these hypotheses.
IS elements Identification :
Problem : ISs annotations are heterogeneous, inaccurate or insufficient
Solution : Reannotation of ISs using comparative study
by adopting the nomenclature defined by Chandler (1998)
- ISs have one or two consecutive ORFs encoding transposase protein
- ISs are grouped into 21 distinct families
ISs Reannotation
(1)
ISs CDS Detection
All annotated CDS
Genome x
ISs Database
Chandler et al.
IS1A-IS1B
(2)
IS1A-IS21A-IS21B-IS1B
IS elements reconstitution
IS1
IS1
IS21
(3)
IS1A-IS3A-IS3B-IS1A
IS1
IS3
ISs complete or partial
ISs fragments (> 20% of difference length)
ISs with internal insertion
Partial elements
ISs Reannotation - Reassessment
262 genomes
Annotated
ISs CDS
1194
(11%)
(2)
Decteted
ISs CDS
8823
(89%)
2115
(22%)
8123 ISs elements
Shigella flexneri
Number of Detected ISs CDS
(1)
Y = 0.77 ( 0.02) X + 5.86 ( 1.89)
R2 = 0.81 (P< 0.0001)
R = 0.95 (P< 0.0001)
Number of Annotated ISs CDS
(3)
83% are complete (may be active)
Only 20% (1994) of Genbank ISs had a consistent classification
The absence of ISs is not anecdotic
24% genomes lack IS
48% genomes [0-10] ISs
High variability
of the number of ISs / Genome
of the number of ISs families / Genome
Sulfolobus solfactaricus (archaebacteria)
Bacillus haludorans (firmicute)
Nitrobacter winogradskyi ( proteobacteria)
Bordetella pertussis ( proteobacteria)
Shigella sonnei ( proteobacteria)
Number of ISs
Number of Genomes
Distribution of ISs in 262 genomes
Number of ISs families
Association with phylogenetic inertia
Rapid dynamic of gain and loss
The number of ISs evolve so fast, that
there is no historical correlation
The effect of IS family specificity
Firmicute ; Proteo ; Proteo
100%
Entero
90%
Incongruent phylogenetic trees
High diversity of ISs found within
strains or closely related species
The effect of IS family specificity : Examples
Pseudomonas syringae tomato
Pseudomonas syringae
syringae
10 IS3
42 IS5
23 IS21
40 IS66
10 IS1111
13 ISNCY
1 IS91
14 IS3
1 IS5
7 IS3
43 IS5
7 IS21
2 IS66
1 IS1111
1 ISNCY
3 IS91
1 IS66
+
= 139
ISs
Pseudomonas syringae pv.
phaseolicola
1 IS110
1 IS630
+
= 18 ISs
This effect is unlikely to explain the variability of ISs
52 IS256
= 116
ISs
The effect of genome size
Wilcoxon test : p<0.0001
N=
64
Spearman’s r=0.63, p<0.0001
198
Strong association between Genome size and IS number (and density)
The larger the genome, the more IS elements it contains
The effect of horizontal gene transfer
Putative orthologs:
Reciprocal best hits,
proteins with >90% similarity
and <20% length difference.
A
Strain specific region:
Exclusive region to a strain
which presented at least
ten consecutive genes without
an orthologs
Lists of orthologs
Strain A
i
B
j
B
C
Strain A
specific region
Strain Specific region
Prophage-Database (Nestle, Casjeans, 2003)
HGT-Database (Garcia-Vallve,2003)
E. Coli
O157:H7 Sakai
The effect of horizontal gene transfer
Wilcoxon test : p<0.0001
t-test : p<0.001
Spearman’s r= 0.31 p>0.1 (NS)
11.4%
5.2%
Genomes lacking ISs
have fewer HGT
ISs are ~ 4 times
more concentrated in
HGT regions
HGT may be a determinant
of the presence of ISs, but
not of its abundance
The effect of horizontal gene transfer
Spearman’s r=0.84, p<0.0001
IS families diversity in HGT
regions is almost as high as in
the entire genome
HGT is a necessary but not sufficient condition to the presence of ISs
The intensity of HGT is not a significant determinant of the IS abundance
The effect of pathogenicity
Yersinia pestis (plague)
Shigella flexneri, sonnei (dysentery)
Bordetella pertussis (whooping cough)
Wilcoxon test : p<0.001
Wilcoxon test : p>0.5
4.3
N=
100
3.6
153
IS=0
No association
between the presence
of IS and
pathogenicity
8%
17%
55%
100%
Strong association between the
frequency of IS and the facultative
character of the ecological
associations
The effect of the type of ecological association
Stepwise multiple regression
Covariate
Number of
ISs
Genome size
Cumulative R2
We removed genomes lacking IS
(possibly under sexual isolation)
Kruskal-Wallis test : p>0.5 (NS)
0.4
Ecological
association
0.47
Frequency
HGT
0.47
Genome size is the
most important
variable
Lifestyles is a nonsignificant
determinant
The effect of human sedentarisation
(Mira et al.,2006)
1) Genomes with many ISs are from prokaryotes associated with humans or
domesticated animals and plants.
2) Large intra-genomic IS expansions are recent.
Kruskal-Wallis test : p>0.5 (NS)
not
indirectly
directly
No evidence that man-related prokaryotes have more Iss.
Genome size explains
˜ 40%
of the variance in IS abundance
The smallest the genome, the lower the number but also the lower density of ISs
- Selection could favor small genomes : optimal use of resources; the replication time (an
increase in genome size caused by IS could be counter-selected)
Density of ISs (/Mb)
Wilcoxon test : p<0.05
Genomes with fewer
ISs, correspond to
the slowest growing
prokaryotes
fast slow
Growth
- ISs are selected to generate genetic variation : (such selection should be stronger in
larger genomes)
One explanation fits well the available data
- Selection against transposition in genomes with higher density of deleterious
transposition targets
tranposition inactivates genes with high probability
the total number of essential genes : ˜300
+ 200-300 genes are nearly ubiquitous
500 nearly essential genes
The abundance of IS elements in genomes could be mostly a
question of space for not highly deleterious transposition
events
Conclusions
High diversity of ISs found within strains or closely related species
The number of ISs evolve so fast, that there is no historical correlation
HGT may be a determinant of the presence of ISs, but not of its abundance
Surprisingly, genome size alone is the best predictor of IS number and density
Selection against transposition in genomes with higher density of deleterious
transposition targets
Impacts of IS abundance?
IS expansion :
observed
expected
% of breakpoints
coincide with IS
Bordetella parapertussis
- increases the rate of genome rearrangements
O/E
R gene/intergene
bronchiseptica
- increases the Bordetella
number of
pseudogenes
Number of ISs
Number of ISs
Acknowledgements
E.P.C Rocha
Institut Pasteur
A. Danchin
La Région Ile de France
Examples
Pseudomonas syringae
syringae
= 18 ISs
14 IS3
1 IS5
1 IS630
1 IS66
1 IS110
Nitrobacter winogradskyi
= 117
ISs
37 IS3
32 IS5
27 IS630
2 IS21
14 IS481
4 ISNCY
Shigella sonnei
= 372
ISs
107 IS3
157 IS1
16 IS630
33 IS4
25 IS21
1 IS66
1 IS91
18 IS110
3 IS605
3 IS1111
4 ISAs1
2 ISNCY
Association with stability ?
Stability
Large Repeats decrease genome stability
density of repeats
(Rocha, Trends Genetics, 03)
Stabiliy
But not ISs elements ?
Number of ISs
Association with phylogenetic inertia ?
The number of ISs evolve so fast,
that there is no historical correlation
Two scenarios
beneficial agents
genomic parasites
+IS
+IS
acquisition
+IS
+IS
expansion
-IS
deletion
lineage loss
Association with lifestyle ?
Burkholderia pseudomallei
Burkholderia mallei
152
Escherichia coli K12
Shigella flexneri
52 Commensal
298
Obligatory pathogen
Bordetella bronchiseptica
Bordetella pertussis
36
Facultative pathogen
Obligatory pathogen
2
Facultative pathogen
247
Obligatory pathogen
-> Link with lifestyle
host restriction, niche change, ..
Bordetella bronchiseptica
Yersinia pestis
Yersinia pestis
Bordetella bronchiseptica
observed
expected
% of breakpoints
coincide with IS
Bordetella parapertussis
Bordetella parapertussis
Association with recent rearrangements ?
Yersinia pseudotuberculosis
Yersinia pseudotuberculosis
Number of ISs
IS expansion promoted frequent
genomic rearrangements
247 ISs
B. pertussis
99% similarity
B. bronchiseptica
B. bronchiseptica
99% similarity
S. enterica typhymurium
S. Enterica typhymurium
E. coli K12
99% similarity
S. enterica enterica serovar thyphi
Shigella flexeneri
99% similarity
Bordetella parapertussis
32 ISs
Association with recent rearrangements ?
90% similarity
E. coli K12
IS expansion increases the rate of
genome rearrangements
Association with pseudogenes ?
Number of ISs in genes
A
B
A
B
Or1’
Or1’
Or1
Or2’
IS
Or2
Or1
IS
Or2
Or1
Or2’
Or2
Number of ISs in intergenes
A
Intergenic
region
B
Or1’
Or2’
Association with pseudogenes ?
O/E
R
pseudo
Number of ISs in genes
R pseudo = ----------------------------Number of ISs in intergenes
Number of ISs
IS expansion increases the number of
pseudogenes
Conclusions
High variability :
- of the number of ISs / Genome
- of the number of ISs families / Genome
- of the number of ISs copies / Family
IS have been recenlty acquired (HGT)
IS expansion :
- is associated with lifestyle/niche change
- increases the rate of genome rearrangements
- increases the number of pseudogenes
+IS
acquisitio
n
-IS
deletion
+IS
expansion
lineage loss
Conclusions
ISs are frequent but not all ubiquitous
ISs number and families varie a lot
Lack of association of the stability with the number of ISs
The presence of ISs is associated with lifestyle
beneficial agents
IS expansion increases the rate of genome rearrangements
IS expansion increases the number of pseudogenes
genomic parasites
Number of Genomes
Number of Genomes
How many IS ?
Number of Genomes
High variability
of the number of ISs / Genome
of the number of ISs families / Genome
Number of Genomes
Number of ISs
Number of ISs families
Number of ISs families
How many IS ?
B. pertussis
16
229
: IS110
: IS481
S. sonnei
157
106
33
25
:
:
:
:
IS1
IS3
IS4
IS21
112-108
126-124
34-22
Number of ISs families
High variability
of the number of ISs families / Genome
of the number of ISs / Family
: IS1
: IS3
: IS4
Log(Number of ISs/Genome)
Number of ISs
S. flexneri
ISs families
Hypothesis I
IS induce short spikes of
instability which are averaged out
in a deep phylogenetic analysis
Hypothesis II
Invasions of highly replicative IS
lead to deleterious instability and
lineage loss