Conference32

Download Report

Transcript Conference32

References
A Comparative Study of Species Richness Estimates Obtained Using Near Complete Fragments and Simulated
Pyrosequencing-Generated Fragments in 16S rRNA Gene-Based Environmental Surveys
N-088
N. H. Youssef1, C. S. Sheik2, L. R. Krumholz2, F. Z. Najar2, B. A. Roe2, M. S. Elshahed1;
1Oklahoma State Univ., Stillwater, OK, 2Univ. of Oklahoma, Norman, OK.
Abstract
It is not yet clear how the number of operational taxonomic units
(OTUs), and hence species richness estimates, determined using
pyrosequencing-generated fragments correlate with those assigned using
near full-length 16S rRNA gene fragments. We constructed a 16S rRNA
clone library from an undisturbed tall grass prairie soil (1132 clones),
and used it to compare species richness estimates using 8
pyrosequencing-candidate fragments (99-361 bp in length) to the near
full-length fragment. While fragments encompassing the V1+V2, and V6
regions overestimated species richness, those encompassing V3, V7, and
V7+V8 regions underestimated species richness, and those
encompassing the V4, V5+V6, and V6+V7 provided estimates
comparable to the near full-length fragment. Similar results were
obtained when analyzing three other datasets. Regression analysis
indicated base variability within an examined fragment could potentially
explain those differences.
Introduction
Typical culture-independent 16S rRNA gene surveys of highly
diverse ecosystems allow for the identification of only abundant
members of the communities (1). Estimates obtained are highly
dependant on sample size.
The large number of 16S rRNA gene sequences produced with
pyrosequencing (7) allows access to rare members of the
community (4), as well as a relatively more accurate estimation
of species richness. However, it is unclear how pair wise
distances, and hence operational taxonomic unit (OTU)
assignments and species richness estimates, computed using
various shorter fragments will correlate to those computed using
near complete 16S rRNA gene.
Here, we constructed, sequenced, and analyzed a 16S rRNA
library of 1132 clones, and compared OTU numbers, and species
richness values obtained using the full-length datasets, and
fragments simulating pyrosequencing output. We show that the
choice of the pyrosequenced fragment could impact the number
of OTUs, and species richness estimates with some fragments
underestimating and others overestimating species richness when
compared to longer near complete 16S rRNA gene fragments.
Further, we established a regression analysis that explains the
nature of the observed discrepancy using the proportion of the
hypervariable, variable, and conserved bases within a fragment.
Materials and methods
Site. Undisturbed tall grass prairie soil in central Oklahoma.
DNA extraction. FastDNA spin kit for soil.
PCR and cloning. Primers 8f-1492r. TOPO-TA cloning kit.
Chimera. Bellerophon (version 3) function on Greengenes.
Alignments. ClustalX program, Greengenes NAST aligner
Clipping of shorter fragments. Jalview (3).
Distance matrix, OTU assignments. PAUP. DOTUR. Scatter plots
slopes.
Species richness estimates. Chao, and ACE estimators. Six parametric
distributions (http://www.stat.cornell.edu/~bunge/).
Other environments. Another soil ecosystem (5), digestive tract of
Zebrafish(8), and ocean floor microbial community (9).
Regression analysis. Multiple regression using MS Excel.
Comparison of number of OTUs
obtained using near complete and shorter
fragment
The number of OTUs obtained using short
simulated fragments ranged between 0.44 to
2.10 times the values obtained using the
near-full length16S.
Fragments encompassing regions V1+V2
and V6 overestimated the number of OTUs
at all taxonomic cutoffs. Fragments
encompassing V3, V7, and V7+V8 regions
underestimated OTU numbers. Fragments
encompassing V4, and V5+V6, and V6+V7
gave, in general, comparable OTU numbers
to the full sequence, as further evidenced by
slope values of 0.97, 1, and 0.98,
respectively (Table2).
Table 1. Variable sites encompassed, and base
composition for the short simulated regions studied
and the near full-length fragment
Variable
regions
Regions
27 -355
338 -548
530 -826
805 -1065
967 -1065
967 -1238
1046 -1238
1046 -1406
NFL
V1+V2
V3
V4
V5+V6
V6
V6+V7
V7
V7+V8
Percentage of
bases
V
47
44
57
49
45
44
40
43
51
HV
18
14
5
10
19
9
3
5
10
C
35
42
38
41
36
47
57
52
39
Percentage of V (variable), HV (highly variable), and
C (conserved) bases. NFL: Near full-length sequences
Table2. Number of OTUs and ratios of species richness estimates obtained using the near full-length
sequences and each of the 8 short simulated regions studied at 5 different taxonomic cutoffs for Soil-OklaA clone library.
Table 4. Slopes obtained for 3 different clone libraries derived from soil, zebrafish
gut, and ocean floor as compared to KFS.
Environment
V1+V2
V3
V4
V5+V6
V6
V6+V7
V7
V7+V8
Trembling Aspen soil
1.23
0.87
0.97
1.05
1.8
1.07
0.68
0.73
Zebrafish gut
1.27
0.72
0.94
1.02
1.35
0.96
0.56
0.65
Basalt Oceanic Floor
1.24
0.86
0.96
1.02
1.74
0.94
0.56
0.66
KFS
1.2
0.88
0.97
1
1.67
0.98
0.6
0.65
Elucidating factors behind pair wise distances discrepancies between
short and near full-length sequences.
We hypothesized that since the 16S rRNA molecule is made of sites with
varying levels of evolutionary conservation, then the proportion of these
sites in a specific amplicon would impact the pair wise distance values
obtained in the dataset. To this end, we used the classification, put forward
by the reviews of Baker et al. (2), and Van de Peer et al. (10), of all base
pairs in the 16S rRNA gene of E. coli into conserved (C), variable (V),
and highly variable (HV) to determine the % of C, V, and HV base pairs in
each of the pyrosequencing fragments and compared it to the near fulllength fragment. We used multiple regression and tested all possible
combinations of percentages and ratios of C, V, and HV bases. The best
model equation obtained was y (slope)= (30.5 x C/total) + (11.5 x HV/V) (27.9 x HV/total) - (8.5 x C/V) + (5.25 x HV/C) – (0.001 x length) -4.79.
Regions
Ca
V1+V2
V3
V4
V5+V6
V6
V6+V7
V7
V7+V8
OTU Ratio
OTU Ratio
OTU Ratio
OTU Ratio
OTU Ratio
OTU Ratio
OTU Ratio
OTU Ratio
3
652
1.3
495
0.68
619
1.09
570
0.92
584
1
514
0.8
340
0.4
412
0.54
639
6
499
1.48
331
0.76
414
1.19
393
0.96
479
1.33
375
0.9
194
0.37
241
0.45
397
8
407
1.54
261
0.74
327
1.11
328
1.19
418
1.54
300
0.96
149
0.38
187
0.47
306
10
345
1.52
203
0.76
267
1.12
267
1.12
370
1.69
241
0.92
112
0.31
143
0.43
242
15
233
2
122
0.69
158
1.07
166
1.07
283
2.41
142
0.93
61
0.35
73
0.43
135
Sl.f
1.2
0.88
0.97
1
1.67
0.98
0.6
0.65
NFL
NA
Comparing species richness estimates in short and long fragments at various taxonomic cutoffs in
Soil Okla-A clone library.
All three species richness estimation methods as well as slopes of scatter plots were in general agreement
with each other, as well as with results obtained from OTU assignments in describing the relationship
between long and short fragments.
Table 3. Parametric species richness estimates obtained using the near full-length sequences and each of the 8
short simulated regions studied at 5 different taxonomic cutoffs for Soil-Okla-A clone library
Ca
V1+V2
V3
V4
V5+V6
V6
V6+V7
V7
V7+V8
NFL
3
6
8
10
15
4589 178
2360 94
1616 70
1275 59
737 41
2273 91
1397 68
686 34
627 37
190 11
3428 125
2223 100
1034 48
792 40
305 17
2759 125
1539 68
1129 53
904 46
324 17
3179 121
2302 95
1488 62
1407 63
845 41
2688 108
1684 79
1023 50
865 48
292 19
1209 57
547 32
343 22
207 13
91 7
1959 87
706 38
383 20
288 17
123 10
2819 98
1790 80
1036 51
912 52
347 24
Comparing OTUs, species richness estimates and slopes of scatter plots inshort and long fragments
in libraries derived from other ecosystems.
Trends obtained from OTU determinations and scatter plot slopes of: a Trembling Aspen soil (1152
clones), the digestive tract of Zebrafish (612 clones), and microbial communities inhabiting the ocean
crust in the east pacific ridge (902 clones) were strikingly similar to those observed with soil Okla-A
clone library (Table 4). Species richness estimates for these three environments mirrored the same trends
(data not shown).
Conclusions
•Regions V1+V2, as well as V6 overestimate diversity, regions V3, V7,
and V7+V8 underestimate diversity, while regions V4, V5+V6, and
V6+V7 give comparable estimate to near full-length fragments.
•This pattern held true for the various environments tested.
•The bias in species richness estimates could readily be explained by
base variability.
•While previous studies suggested using region V4 for phylogenetic
studies (6, 11), our evaluation of species richness suggests that V4,
V5+V6, and V6+V7 regions provide estimates closest to longer
fragments. Collectively, the V4-encompassing region appears to
provide the best choice for both phylogenetic assignments and
estimates consideration.
•Based on this study, we recommend the use of fragments (V4, V5+V6,
V6+V7) for pyrosequencing studies concerned with species-richness
determination in microbial communities.
References
1.Axelrood, P. E., et al.. 2002. Can. J. Microbioil. 48:655-674.
2.Baker, G. C., et al. 2003. J. Microbiol. Methods 55:541-555.
3. Clamp, M., et al. 2004. Bioinformatics 20:426-427.
4. Huber, J. A., et al. 2007. Science 318:97-100.
5. Lesaulnier, C., et al. 2008. Environ. Microbiol. 10:926-941.
6. Liu, Z., et al. 2007. Nucleic Acids Res. 35:120-130.
7. Margulies, M., et al. 2005. Nature 437:376-380.
8. Rawls, J. F., et al. 2006. Cell 127:423-433.
9. Santelli, C. M., et al. 2008. Nature 453:653-656.
10. Van de Peer, Y., et al. 1996. Nucleic Acids Res. 24:3381-3391.
11. Wang, Q., et al. 2007. Appl. Environ. Microbiol. 73:5261-5267.