How Quantitive are 16S surveysx

Download Report

Transcript How Quantitive are 16S surveysx

Metagenomics: From Bench to Data Analysis
19-23rd September 2016
16S rRNA-based surveys for
Community Analysis: How
Quantitative are they?
Dr Mark Alston
Computational Biologist
Organisms and Ecosystems Group
[email protected]
Outline
• Compare sequencing platforms and 16S rRNA regions
• Amplicon choice
– amplicons vs. full-length rRNA sequencing
• Bias and quantification
• Comparison to WGS approaches
www.earlham.ac.uk
16S Microbial Community Profiling
• 16S rRNA gene sequence
– conserved (green) and hypervariable (blue) regions
• Most common phylogenetic marker
– ‘gold standard’ in molecular surveys of bacterial and archaeal diversity
• Pros
– ubiquitous, highly conserved, evolutionarily stable
• Cons
– often multiple copy, little resolution at/below species level
www.earlham.ac.uk
Comparing Different Platforms and Target
Regions
• ‘A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA
community profiling’
– DOI: 10.1186/s12864-015-2194-9
• Compare sequencing platforms
–
–
–
–
Summary of primers and platforms used
MiSeq (Illumina),
Pacific Biosciences RSII
454 GS-FLX/+ (Roche)
IonTorrent (Life Technologies)
• Compare target regions
• Assess performance via synthetic microbial communities
– mix gDNA from 49 bacterial and 10 archaeal species
www.earlham.ac.uk
– even / uneven distribution
Ability of Different Platforms and Regions
to Reconstruct the Synthetic Community
• Even synthetic community
• Platform had a significant effect
Bacterial Species
• Species’ frequencies highly unbalanced
• Possible causes
– primer mismatches
– rRNA copy number
– amplification bias (associated with target length)
Target Region
www.earlham.ac.uk
How do Different rRNA Regions reflect
Composition?
• ‘Comparative metagenomic and rRNA microbial
diversity characterization using archaeal and
bacterial synthetic communities’
– DOI:10.1111/1462-2920.12086
• Synthetic Bacteria community
• Heat map represents accuracy ratio
– Perfect agreement has value of 1
– underestimated abundance
– overestimated abundance
www.earlham.ac.uk
How do Different rRNA Regions reflect
Composition?
• ‘Comparative metagenomic and rRNA microbial
diversity characterization using archaeal and
bacterial synthetic communities’
– DOI:10.1111/1462-2920.12086
• Synthetic Bacteria community
• Heat map represents accuracy ratio
– Perfect agreement has value of 1
– underestimated abundance
– overestimated abundance
• Regions suffer from substantial bias
www.earlham.ac.uk
Which Region Should I Choose?
• 16S rRNA gene sequence
– conserved (green) and hypervariable (blue) regions
• Most common approach
– V4, V3–V4 or V4–V5 primers on Illumina platforms
– ~ 250–430 bp read length
e.g. 16S for V4 on MiSeq
www.earlham.ac.uk
http://www.illumina.com/content/dam/illumina-marketing/documents/products/appnotes/appnote_miseq_16S.pdf
Full-length vs. Amplicon 16S Sequencing
• Factors affecting taxon abundance estimates and tree-placement
– Sequencing platform, primer choice, read length, environmental source, reference database, assignment
method [or a combination]
• New technologies
–
–
–
–
–
short reads sequence ~15-30 % of the full 16S rRNA gene
more quantitative information
reduced taxonomic resolution
species level assignment can be elusive
implications for inferring metabolic traits in various ecosystems
www.earlham.ac.uk
Full-length vs. Amplicon 16S Sequencing
• Factors affecting taxon abundance estimates and tree-placement
– Sequencing platform, primer choice, read length, environmental source, reference database, assignment
method [or a combination]
• New technologies
–
–
–
–
–
short reads sequence ~15-30 % of the full 16S rRNA gene
more quantitative information
reduced taxonomic resolution
species level assignment can be elusive
implications for inferring metabolic traits in various ecosystems
• Use full-length 16S rRNA sequencing?
www.earlham.ac.uk
Full-length 16S rRNA Sequencing
• PacBio
–
–
–
–
–
long-read, single-molecule real-time (SMRT) technology
average read lengths > 8 kb at ~ 87% read accuracy
only been used for a few environmental surveys
‘High-resolution phylogenetic microbial community profiling’
DOI: 0.1038/ismej.2015.24
• MinION™
–
–
–
–
USB stick-sized device
per-base sequencing accuracy ~85% for 2D reads
additional read length helps resolve 16S rRNA to species level
‘Species level resolution of 16S rRNA gene amplicons sequenced through MinIONTM portable nanopore
sequencer’
– DOI: 10.1186/s13742-016-0111-z
www.earlham.ac.uk
Full-length 16S rRNA Sequencing
• PacBio
–
–
–
–
–
long-read, single-molecule real-time (SMRT) technology
average read lengths > 8 kb at ~ 87% read accuracy
only been used for a few environmental surveys
‘High-resolution phylogenetic microbial community profiling’
DOI: 0.1038/ismej.2015.24
• MinION™
–
–
–
–
USB stick-sized device
per-base sequencing accuracy ~85% for 2D reads
additional read length helps resolve 16S rRNA to species level
‘Species level resolution of 16S rRNA gene amplicons sequenced through MinIONTM portable nanopore
sequencer’
– DOI: 10.1186/s13742-016-0111-z
www.earlham.ac.uk
Full-length 16S rRNA Sequencing and Gene
Variability
• non-homogeneous distribution of mutations
• varies across different phylogenetic groups
• leads to both over- and underestimation of community diversity
www.earlham.ac.uk
Full-length 16S rRNA Sequencing and Gene
Variability
• non-homogeneous distribution of mutations
• varies across different phylogenetic groups
• leads to both over- and underestimation of community diversity
www.earlham.ac.uk
Full-length 16S rRNA Sequencing and Gene
Variability
• non-homogeneous distribution of mutations
• varies across different phylogenetic groups
• leads to both over- and underestimation of community diversity
2 Salmonella spp. 97.4% identical across gene
100% identical across V4 region
Underestimate community diversity
www.earlham.ac.uk
Full-length 16S rRNA Sequencing and Gene
Variability
• non-homogeneous distribution of mutations
• varies across different phylogenetic groups
• leads to both over- and underestimation of community diversity
Mutations accumulated in V4 region
Overestimate community diversity
www.earlham.ac.uk
Compare FL vs. V4 [Sakinaw lake samples]
• Community composition profile at genus level
• Colour pairs denote samples of the same depth
• Bubble sizes indicate read abundance
www.earlham.ac.uk
Compare FL vs. V4 [Sakinaw lake samples]
• FL vs. V4 discrepancies highlighted by boxes
– e.g. Bacillus greatly underrepresented by V4 c.f. PB [50m samples]
– ‘High-resolution phylogenetic microbial community profiling’
www.earlham.ac.uk
– DOI: 0.1038/ismej.2015.24
Platforms and Regions Suffer from
Substantial Bias
• The observed relative frequencies do not reflect the true species frequencies in the
community
www.earlham.ac.uk
Platforms and Regions Suffer from
Substantial Bias
• The observed relative frequencies do not reflect the true species frequencies in the
community
www.earlham.ac.uk
Platforms and Regions Suffer from
Substantial Bias
• The observed relative frequencies do not reflect the true species frequencies in the
community
• But, the observed differences between samples could still reflect true differences
• Can we have a quantitative method despite the bias?
www.earlham.ac.uk
Can 16S rRNA Sequencing be Quantitative?
• ‘A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA
community profiling’
– DOI: 10.1186/s12864-015-2194-9
• Assembled 2 synthetic communities
– one with even distribution, one uneven
• Take pairs of samples
• Sequence on MiSeq and PacBio platforms
www.earlham.ac.uk
Can 16S rRNA Sequencing be Quantitative?
• Compare for each species
• Highly significant correlation between the two ratios [blue
line] and a slope of 1 [red line]
Ratio of Observed Freq.
– true ratio of frequencies [known mixtures]
– and
– observed ratio of frequencies
MiSeq
PacBio
www.earlham.ac.uk
Ratio of True Freq.
Can 16S rRNA Sequencing be Quantitative?
• Compare for each species
• Highly significant correlation between the two ratios [blue
line] and a slope of 1 [red line]
• Implies 16S rRNA sequencing is strongly quantitative
despite being biased
Ratio of Observed Freq.
– true ratio of frequencies [known mixtures]
– and
– observed ratio of frequencies
MiSeq
• MiSeq more quantitative than PacBio
PacBio
www.earlham.ac.uk
Ratio of True Freq.
MiSeq more quantitative than PacBio
• Which are more accurately quantified on one platform
relative to the other?
Ratio of Observed Freq.
• Species responsible for this difference?
MiSeq
PacBio
www.earlham.ac.uk
Ratio of True Freq.
MiSeq vs. PacBio
• Species with significantly different quantification accuracies:
www.earlham.ac.uk
MiSeq vs. PacBio
• Species with significantly different quantification accuracies:
• MiSeq the better platform
www.earlham.ac.uk
MiSeq vs. PacBio
• Species with significantly different quantification accuracies:
• MiSeq the better platform
• Except for strain resolution
• Full-length 16S rRNA sequencing of benefit
– Shewanella baltica OS223
– Shewanella baltica OS185
www.earlham.ac.uk
16S Microbial Community Profiling
• 16S rRNA gene sequence
– conserved (green) and hypervariable (blue) regions
• Most common approach
– V4, V3–V4 or V4–V5 primers on Illumina platforms
– ~ 250–430 bp read length
• Economy of scale
e.g. 16S for V4 on MiSeq
– single MiSeq run > 10 million reads
• High base-calling accuracy
www.earlham.ac.uk
http://www.illumina.com/content/dam/illumina-marketing/documents/products/appnotes/appnote_miseq_16S.pdf
Compare Error Rates Across Platforms
• Even synthetic community
• Platform had a significant effect
• MiSeq has the most accurate sequence reads
www.earlham.ac.uk
Impact of Overlapping Reads on MiSeq V4
Error Rates
• Even synthetic community
• Overlapping forward and reverse reads greatly
reduces errors
• MiSeq Dual Index barcode
– Illumina barcodes on both reads
‘stitched’ reads
www.earlham.ac.uk
Shotgun Metagenomics vs. Amplicon
Sequencing
• ‘Comparative metagenomic and rRNA microbial
diversity characterization using archaeal and
bacterial synthetic communities’
– DOI: 10.1111/1462-2920.12086
• Compare amplicon sequencing to Illumina [HiSeq]
and 454 metagenomics sequencing
www.earlham.ac.uk
Shotgun Metagenomics vs. Amplicon
Sequencing
• ‘Comparative metagenomic and rRNA microbial
diversity characterization using archaeal and
bacterial synthetic communities’
– DOI: 10.1111/1462-2920.12086
• Compare amplicon sequencing to Illumina [HiSeq]
and 454 metagenomics sequencing
• Metagenomic data tends to outperform amplicon
sequencing
www.earlham.ac.uk
•
Shotgun Metagenomics vs. Amplicon
Sequencing
‘A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA
community profiling’
– DOI: 10.1186/s12864-015-2194-9
• Metagenome sample
– benchmark
– should be relatively unbiased as fewer PCR
amplification steps in library construction
• WGS gives the most accurate species
estimations
www.earlham.ac.uk
MiSeq MG sample
expected
Is 16S “Metagenomics” ?
• Many papers talk about
– “metagenomics analysis based on microbial 16S rRNA gene sequencing”
– “16S metagenomic studies” etc.
• But rRNA surveys focus on a single gene, not genomes
• Is this due to a fear of not getting funded if you don’t include a word containing
‘Meta*omics’?
• “Referring to 16S surveys as metagenomics is misleading and annoying #badomics
#OmicMimicry”
•
http://phylogenomics.blogspot.co.uk/2012/08/referring-to-16s-surveys-as.html
www.earlham.ac.uk
In Summary
• Many sources of bias when we sequence 16S rRNA
– e.g. platform, region etc.
• Can still be a quantitative
• MiSeq V4 a good ‘all round bet’
– prior knowledge of taxa may suggest otherwise
– combinations of primers?
– full-length for strain resolution
• Whole genome shotgun
– better estimations of species abundances
www.earlham.ac.uk
Metagenomics: From Bench to Data Analysis
19-23rd September 2016
Thank You for Listening
Dr Mark Alston
Computational Biologist
Organisms and Ecosystems Group
[email protected]