Why would we want to do it?

Download Report

Transcript Why would we want to do it?

Bioinformatics and
16S RNA sequencing analysis
http://okinbretest.ouhsc.edu/
http://okinbretest.ouhsc.edu/Bioinformatics.aspx
Why would
we want to
do it?
How
is it
performed?
Three 16S rRNA
Sequencing
Questions to
Answer
What is
16S rRNA
sequencing?
Terms to Define
Biology
Computer
Science
Math/Statistics
What is bioinformatics?
What is Amplicon Sequencing?
When a particular gene or gene fragment is amplified and
the sequence determined to achieve insight when studying
microbiomes (all microorganisms in a particular
environment).
http://users.ugent.be/~avierstr/principles/pcr.html
Metagenome-the genomes of the total microbiota in a
community.
Metagenomics allows us to extract DNA sequences from
a microbial community in nature bypassing the need for
cultures.
What is the 16S gene and why us it?
16S – named because rate at which it sediments in ultracentrifuge
(Svedberg)
1.
2.
3.
4.
5.
Protein found in small subunit of
ribosome (Present in all species)
Ubiquitous - Highly conserved
because mutation would probably be
deleterious
Extreme sequence conservation
useful for primers
Variable regions for organism
identification
Well annotated reference databases
http://www.nature.com/nrmicro/journal/v12/n9/fig_tab/nrmicro3330_F1.html
http://www.alimetrics.net/en/index.php/dna-sequence-analysis
Adapted from: Kevin E. Ashelford et al. Appl.
Environ. Microbiol. 2005;71:7724-7736
Fry and Ellis, Patent Publication number WO2014138119 A2
http://www.illumina.com/content/dam/illuminasupport/documents/documentation/chemistry_documentat
ion/16s/16s-metagenomic-library-prep-guide-15044223b.pdf
Variable and conserved regions within the 16S
rRNA gene
Example of primers
Valdivia-Anistro, J.A., et al., Front. Microbiol. 05 January 2016|
http://dx.doi.org/10.3389/fmicb.2015.01486
Variation in rRNA gene copy number
• Most bacteria contain more than one rRNA operon, and copy number varies
• Can affect abundance estimates
• And some bacteria have high levels of sequence divergence in the rRNA operons
• This can inflate diversity estimates
• Can attempt to correct this (PICRUSt, CopyRighter)
16S pipeline
http://newsexaminer.net/opinion/were-treating-soil-likedirt-its-a-fatal-mistake-because-all-human-life-depends- http://www.copybook.com/pharmaceutical/companies/anachem
/articles/the-new-kapa-express-extract-from-anachem-ltd
on-it/
Sampling
Extracting
DNA
http://mrdnalab.com/dna-sequencing/illumina-miseq.html
https://www.neb.com/protocols/2015/01/
23/setting-up-the-pcr-reaction-e7600
Prepping
Sample
http://www.clipartbest.com/cartooncomputer-pictures
Sequencing
Bioinformatics
Which sequencing chemistry to use?
Kuczynski et al., Nat. Rev. Genet. 13: 4-58 (2012)
Sequencing by Synthesis and Basecalling
http://openwetware.org/wiki/BioMicroCenter:Sequencing
Illumina Inc. Video Illustrating
Prepping and Sequencing by Synthesis
Intro to Sequencing by Synthesis:
Illumina Sequencing Technology
Industry-leading Data Quality
https://www.youtube.com/watch?v=womKfi
kWlxM
https://www.youtube.com/watch?v=HMyCqWhw
B8E
Bioinformatics
Workflow for deep sequencing of 16S rRNA gene amplicons
Ines Yang et al. FEMS Microbiol Rev 2013;37:736-761
Operational Taxonomic Units (OTUs) and
Annotation - MiSeq Reporter
•Operational taxonomic unit-extant taxon
–Cluster of similar amplicon sequences
–97% identity commonly used
–Pre-clustering to collapse all identical sequences into one category
speeds analysis
•OTU determination
–De novo methods cluster by similarity with no reference to outside
sequences
–Taxonomy-based methods cluster based on similarity to known
sequences-uses ClassifyReads, a proprietary algorithm
–Combined taxonomy + de novo methods
•Known sequence databases
–Ribosomal Database Project (https://rdp.cme.msu.edu/)
–GreenGenes (http://greengenes.lbl.gov/cgi-bin/nph-index.cgi)
Statistically Speaking and
Interpreting the Data
Species Diversity Results
•Species richness
–Number of species present in a sample
–Determined by the number of OTUs present
–Influenced by sequencing depth
•Species evenness
–How close in numbers each species is in a sample
•Species diversity
–Composite of species richness and species evenness
α-diversity and Shannon’s diversity
index
•Diversity within a sample
•Measures the amount of information needed to describe every member of the
community
•Shannon’s diversity index in an information statistic index
•If pi is the proportion of individuals of species i, then the diversity (H’) is:
•From this, one can calculate evenness, which is the ratio of the actual H’ to
the maximum value (and so ranges from 0 to 1)
Adapted from: http://ww2.tnstate.edu/ganter/B412%20L16%20Communities.html
Species Diversity Results
•Species richness
–Number of species present in a sample
–Determined by the number of OTUs present
–Influenced by sequencing depth
•Species evenness
–How close in numbers each species is in a sample
•Species diversity
–Composite of species richness and species evenness
https://www.greenthumb.co.uk/help-and-advice/lawn-problems/weeds
Flower species
Daisy
Dandelion
Buttercup
Total
Field 1
300
335
365
1000
Which field is more diverse?
Field 2
20
49
931
1000
Field 1
Hierarchical Clustering Dendrogram
• Dendrogram: a
tree diagram
• Topology- how
closely sample
related
• The two samples
(clusters) most
similar are clustered
together forming a
new cluster.
• At each step, the
next closest sample
is clustered with the
new cluster.
What is a Principle Coordinate
Analysis (PCoA)
• Graphically
represents
similarities or
dissimilarities of
data.
• Begins with a
distance matrix ends
with computation of
Eigen values and
vectors.
• Objects ordinated
closer to one another
are more similar than
those ordinated
further away.
Why do 16S rRNA sequencing?
http://commonfund.nih.gov/hmp/overview
Scheme for studying the “normal” human microbiome.
Bacterial distribution by body site. This figure shows the distribution by body site of
bacteria that have been sequenced under the HMP or are in the sequencing pipelines.
The NIH HMP Working Group, Peterson J, Garges S, et al. The NIH
Human Microbiome Project. Genome Research. 2009;19(12):2317-2323.
doi:10.1101/gr.096651.109.
α Diversity rarefaction curves of cutaneous microbiota in psoriasis (lesion), unaffected and control specimens. (A) Taxonomical richness trends
towards decreasing α diversity in unaffected and lesion specimens relative to control, with no statistically significant differences between skin
types. (B) Shannon index is significantly different (decreases from control to unaffected to lesion) among skin types at all taxonomic levels (P <0.05),
except at the operational taxonomical unit (OTU) level. (C) Analysis of taxa sharing. Taxa present in <3 samples excluded from the analysis. Taxa that
are only observed in one clinical skin type are denoted as ‘unique’. Taxa that are present in two types of skin are denoted as ‘shared’. The data show
that nearly all taxa are represented in all three types of skin. The shading represents the relative distribution (heatmap) for each column number
(green = low, yellow = intermediate, red = high).
Alekseyenko AV, Perez-Perez GI, De Souza A, et al. Community differentiation of the cutaneous microbiota in psoriasis. Microbiome. 2013;1:31.
doi:10.1186/2049-2618-1-31.
Predictive functional
profiling of microbial
communities using 16S
rRNA marker gene
sequences Morgan G I
Langille, et al. Nature
Biotechnology 31, 814–821
(2013) doi:10.1038/nbt.2676
(2013).
Site 1
Site 2
Site 4
Site 3
Site 5
Site 6
Summary
• What?
Bioinformatics
Amplicon sequencing and metagenomics
16S gene
• How?
Pipeline and Different Sequencing Platforms
Bioinformatic’s Step
Operational Taxonomic Units
Statistical Terms
Alpha Diversity and Shannon Index
Species Diversity, Richness and Evenness
Hierarchical Clustering Dendrogram and PCoA
• Why?
Human Microbiome Project – Microbiota in Psoriasis Demonstration Project
Cameron University’s Wolf Creek Microbial Diversity Project