Presentazione standard di PowerPoint
Download
Report
Transcript Presentazione standard di PowerPoint
Web Valley 2014
16S sequencing for microbiome studies
Nicola Segata and Nick Loman
Principal Investigator
Laboratory of Computational Metagenomics
Centre for Integrative Biology
University of Trento
Italy
1
The human microbiome
•
•
•
Nature 486(7402)
10x more microbial than
human cells
1M times as many
microbes inside each of
us than humans on earth
100x more microbial than
human genes
Who’s there?
What are they doing?
Scientific American, May 2012
Metagenomics:
Study of uncultured
microorganisms from the
environment, which can
include humans or other
living hosts
Focus on taxonomic and
functional characteristics of
the total collection of
microorganisms within a
community
Main experimental tool is
high-throughput
sequencing: ~10M short
(~100nt) reads per dataset
16S sequencing
Liu, Bo, et al. "Accurate and
fast estimation of taxonomic
profiles from metagenomic
shotgun sequences." BMC
genomics 12.Suppl 2 (2011):
S4.
PROS:
• Cost-effective
• Avoids non-bacterial contamination
• The resulting dataset is reasonable in size and
complexity
• Mature analysis software available
• Can potentially catch low abundance bacteria
CONS:
• Not genome-wide (so no metabolic potential)
• Limited taxonomic resolution
• Not effective for pathogen profiling
• Cannot catch viruses and eukaryotes
• Several (usually underestimated) biases
• Almost impossible cross-study comparisons
16S-based “metagenomics”
V6
Samples
PCR to amplify the single
16S rRNA marker gene
Microbes
George Rice, Montana State University
Counts
Classify sequence
microbe
V2
4
The ribosome
Ribosomes are the universal machinery that translate the genetic code into proteins.
The ribosomal machinery is
composed by:
• Two subunits
• several proteins
• mRNAs
• tRNAs
• rRNA (5S, 16S, 23S)
The ribosome
The ribosome
The 16S rRNA
Center for Molecular Biology of RNA, University of California
The 16S rRNA gene 1/3
This annotation
has been
performed on a
representative
E. coli 16S
sequence
Baker, G. C., J. J. Smith, and Donald A. Cowan. JMMs 55.3 (2003): 541-555.
The 16S rRNA gene 2/3
The 16S rRNA gene 3/3
The 16S rRNA
V7
V6
V4
V6
V4
V5
V8
V5
V7
V8
V3
V3
V1
V1
V9
V2
Center for Molecular Biology of RNA, University of California
V2
V9
16S: The 530 loop structure of six species
1
The 16S gene: statistical view of the variable regions
Andersson, Anders
F., et al. " PloS one
3.7 (2008)
Variability within the 16S rRNA gene
V2
V1
V6
V3
V4
V5
V7
V8
V9
Claesson,
Marcus J., et al.
Nucleic acids
research 38.22
(2010)
Multiple variable regions can be targeted simultaneously
(if you have long enough reads!)
Which HTM would you
choose?
• 454 historically well
suited (~400nt reads
3 regions), good
cost/throughput
trade-off
• Illumina (HiSeq) is not
optimal (shorter
reads, unnecessary
high throughput)
• Illumina MiSeq and
IonTorrent can be a
nice compromise.
Which HTM would you choose?
Throughput
Very low
(~1 seqs /
sample)
Medium
(~3k seqs /
sample)
High
(~50k seqs /
sample)
The data revolution is now
16
One of the challenges: which technology?
http://flxlexblog.files.wordpress.com/
17
One of the challenges: which technology?
18
Mol Ecol Resour. 2011 Sep;11(5):759-69
One of the challenges: which technology?
19
Mol Ecol Resour. 2011 Sep;11(5):759-69
In silico primer validation/testing
The idea: use the available (taxonomically labeled) 16S sequences to check which
organisms are targeted by the primers
http://www.arb-silva.de/search/testprobe (to test single probes)
http://www.arb-silva.de/search/testprime (to test pairs of probes, below)
An example on “universal” primers
Fw: CCTACGGGRSGCAGCAG Rev: ATTACCGCGGCTGCT (our primers)
An example on “universal” primers
Archaea, 49.2% matches
Bacteria, 94.7% matches
Proteobacteria, 97.1 % matches
WS6 candidate division, 2.9 % matches
BE AWARE: universal primers do not exists, and the choice of the
primers is going to bias your study no matter what!
Validation of hypervariable regions using a mock community
Ward, Doyle V., et al. PloS one 7.6 (2011): e39315-e39315.
Variability within hyper variable regions
A high level 16S analysis workflow
Hamady, Micah, and Rob Knight. Genome research 19.7 (2009): 1141-1152.
Schematic 16S analysis workflow
Input dataset
(one sample)
Multiple-sequence
alignment
CAAGCCGAAUGCAGCUAUUC
CAAGCCUGAUGCAGCCAUGC
CAUGCCUGAGACAGCCUUGC
CAAGCCUGAUGCAGCCAUGC
CAAGCCGAAUGCAGCUAUCC
CAAGGCUGAGACAGCCUUGC
CAAGCCUGAUGCUGCCAUGC
CAAGCCGAAUGCAGCUAUGC
CAAGCCGGAGACAGCCUUGC
CAAGCCUGAUGCAGCCAUGC
CAAGCCGAAUGCAGCUAUUC
CAAGCCUGAUGCAGCCAUGC
CAUGCCUGAGACAGCCUUGC
CAAGCCUGAUGCAGCCAUGC
CAAGCCGAAUGCAGCUAUCC
CAAGGCUGAGACAGCCUUGC
CAAGCCUGAUGCUGCCAUGC
CAAGCCGAAUGCAGCUAUGC
CAAGCCGGAGACAGCCUUGC
AAAGCCUGAUGCAGCCAUGC
Operational taxonomic
unit (OTUs) definition
CAAGCCGAAUGCAGCUAUUC
CAAGCCGAAUGCAGCUAUCC
CAAGCCGAAUGCAGCUAUGC
CAUGCCUGAGACAGCCUUGC
CAAGGCUGAGACAGCCUUGC
CAAGCCGGAGACAGCCUUGC
CAAGCCUGAUGCAGCCAUGC
CAAGCCUGAUGCAGCCAUGC
CAAGCCUGAUGCUGCCAUGC
AAAGCCUGAUGCAGCCAUGC
OTU_1
OTU_2
OTU_3
OTU_1
OTU_3
OTU_2
OTU_1 30%
OTU_2 30%
OTU_3 40%
OTU_1 E. coli
OTU_2 S. aureus
OTU_3 S. pneumoniae
16S DB with
taxonomic
information
Intro into diversity analysis
Alpha-diversity
•
•
•
A measure of how diverse (complex) a microbial
community is
“within sample” diversity
Species richness (i.e. number) is a widely use
alpha diversity index
Beta-diversity
•
•
•
A measure of how different two microbial
communities are
“between sample” diversity
Inverse of number of shared species is one
possibility to estimate beta-diversity
Jurasinski, G., Retzer, V., & Beierkuhnlein, C. (2009). Oecologia, 159(1), 15-26.
Practical tutorial time
http://nickloman.github.io