University of Idaho IBEST seminar

Download Report

Transcript University of Idaho IBEST seminar

Charting the function of microbes
and microbial communities
Curtis Huttenhower
Harvard School of Public Health
Department of Biostatistics
11-17-11
Valm et al, PNAS 2011
What to do with your metagenome?
Reservoir of
and protein
Who’s there? genefunctional
What are they doing?
information
Comprehensive
snapshot of
microbial ecology
and evolution
Who’s there varies: your microbiota is
plastic and personalized.
This personalization is true at the level
of phyla, genera, species, strains, and
sequence variants.
Public health tool
monitoring
population health
and interactions
What they’re doing is adapting to
their environment:
Diagnostic or
you, your body, and your environment.
prognostic
biomarker for
host disease
3
Slides by Dirk Gevers
The NIH Human Microbiome Project (HMP):
A comprehensive microbial survey
•
•
•
•
•
What is a “normal” human microbiome?
300 healthy human subjects
Multiple body sites
• 15 male, 18 female
Multiple visits
Clinical metadata
www.hmpdacc.org
A three-tier study design…
16S
WGS
ref
…for mining metagenomic data
WGS
16S
>3k reads
per sample
Filtering/
trimming
~100M reads
per sample
Assembly
~50%
Chimera
removal
contigs
BLAST
against
functional
DBs
Annotation
Taxonomic Clustering
classification into OTUs
(RDP)
Organismal census
at different taxonomic levels
genes
~90M proteins
Map
on
ref
~57%
~36%
pathways
census
...
“Pathogen” carriage varies a lot
22 ***uniquely identifiable*** nonzero abundance
“pathogens” from NIAID’s list of 135
0.12
Gemella
Supragingival Capnocytophaga
plaque
0.06
0.12
1
Capnocytophaga gingivalis
Actinomyces
0.1
Capnocytophaga sputigena
0.08
Capnocytophaga ochracea
0.06
0.04
0.02
0
Alistipes
0.8
Relative Abundance
0.08
Posterior fornix
0.14
Relative Abundance
124 Samples
0.6
0.4
0.2
0
Stool
0.04
0.4
0.02
0
Relative Abundance
Average Relative Abundance
0.1
0.3
0.2
Gardnerella vaginalis
Alistipes putredinis
Gemella haemolysans
Actinomyces odontolyticus
Gardnerella
Capnocytophaga sputigena
Capnocytophaga gingivalis
Capnocytophaga ochracea
Eikenella corrodens
Burkholderiales bacterium
Propionibacterium
acnes
Gardnerella
vaginalis
Parvimonas micra
Porphyromonas gingivalis
Proteus mirabilis
60 Samples Streptobacillus moniliformis
Atopobium rimae
Ureaplasma urealyticum
Eggerthella lenta
Proteus penneri
Arcobacter butzleri
Salmonella enterica
Nocardia farcinica
Cryptobacterium curtum
Alistipes putredinis
+Propionibacterium
0.1
Buccal mucosa Tongue dorsum
>0.66
0
Supragingival
plaque
146 Samples
Stool
Posterior fornix
Anterior nares
Retroauricular
crease
7
Normalized relative abundance
Phenotypes that explain variation
(or not) can be surprising
8
Normalized relative abundance
Phenotypes that explain variation
(or not) can be surprising
9
Normalized relative abundance
Phenotypes that explain variation
(or not) can be surprising
10
A functional perspective on the
human microbiome
Healthy/IBD
BMI
Diet
100 subjects
1-3 visits/subject
~7 body sites/visit
10-200M reads/sample
100bp reads
BLAST
Functional seq.
KEGG + MetaCYC
Metagenomic
reads
CAZy, TCDB,
VFDB, MEROPS…
Taxon
Geneabundances
SNP
Enzyme
family abundances
expression
genotypes
Pathway abundances
?
Enzymes and
pathways
HUMAnN
HMP Unified Metabolic
Analysis Network
http://huttenhower.sph.harvard.edu/humann
11
HUMAnN: Metabolic reconstruction
Oral (BM)
Oral (TD)
Gut
← Pathways→
Vaginal Skin Nares Oral (SupP)
← Samples →
Oral (BM)
Gut
Oral (SupP)
Oral (TD)
Skin Nares
← Pathways→
Vaginal
← Samples →
Pathway coverage
Pathway abundance
12
A portrait of the healthy human microbiome:
Who’s there vs. what they’re doing
← Pathway abundance →
Nares
Oral (BM)
Vaginal Skin
Gut
Oral (SupP)
Oral (TD)
← Pathway abundance →
← Phylotype abundance →
← Phylotype abundance →
← Subjects →
← Subjects →
13
Niche specialization in human
microbiome function
← Pathway abundance→
Metabolic modules in the
KEGG functional catalog
enriched at one or more
body habitats
← ~700 HMP communities→
• 16 (of 251) modules strongly “core” at 90%+ coverage in 90%+ individuals at 7 body sites
• 24 modules at 33%+ coverage
• 71 modules (28%) weakly “core” at 33%+ coverage in 66%+ individuals at 6+ body sites
• Contrast zero phylotypes or OTUs meeting this threshold!
• Only 24 modules (<10%) differentially covered by body site
• Compare with 168 modules (>66%) differentially abundant by body site
14
Proteoglycan degradation
by the gut microbiota
Glycosaminoglycans
(Polysaccharide chains)
AA core
15
Proteoglycan degradation:
From pathways to enzymes
Enzyme relative abundance
10-8
10-3
• Heparan sulfate degradation
missing due to the absence of
heparanase, a eukaryotic enzyme
• Other pathways not bottlenecked
by individual genes
• HUMAnN links microbiome-wide
pathway reconstructions →
site-specific pathways →
individual gene families
16
Patterns of variation in human
microbiome function by niche
17
Patterns of variation in human
microbiome function by niche
• Three main axes of variation
• Eukaryotic exterior
• Low-diversity vaginal
• Gut metabolism
• Oral vs. tooth hard surface
• Only broad patterns:
every human-associated habitat
is functionally distinct!
18
Normal varies a lot at the genus level (16S)
Relative frequency of genera within Stool
343 genera
Relative frequency
Parabacteroides
Faecalibacterium
Alistipes
Bacteroides
200 subjects
Dirk Gevers
Normal varies a lot at the species level (WGS)
Relative frequency
Relative frequency of Bacteroides species within Stool
Bacteroides caccae
Bacteroides stercoris
Bacteroides sp.
Bacteroides uniformis
Bacteroides sp.
Bacteroides vulgatus
123 samples
Dirk Gevers
What’s wrong with this picture?
52 posterior fornix microbiomes →
Species and strains matter – but
so does your method for
identifying them in a community!
Lactobacillus crispatus MV-1A-US
Lactobacillus crispatus JV-V01
Lactobacillus crispatus 125-2-CHN
Lactobacillus crispatus 214-1
Lactobacillus crispatus MV-3A-US
Lactobacillus crispatus ST1
Lactobacillus gasseri JV-V03
Lactobacillus gasseri 202-4
Lactobacillus gasseri 224-1
Lactobacillus gasseri MV-22
Bifidobacterium breve DSM 20213
Bifidobacterium dentium ATCC 27679
Mycoplasma hominis
Clostridiales genomosp BVAB3 str UPII9-5
Clostridiales genomosp BVAB3 UPII9-5
Gardnerella vaginalis AMD
Prevotella timonensis CRIS 5C-B1
Megasphaera genomosp type 1 str 28L
Porphyromonas uenonis 60-3
Gardnerella vaginalis 409-05
Gardnerella vaginalis 5-1
Atopobium vaginae DSM 15829
Gardnerella vaginalis ATCC 14019
Lactobacillus jensenii 1153
Lactobacillus jensenii 269-3
Lactobacillus jensenii SJ-7A-US
Lactobacillus jensenii 208-1
Lactobacillus jensenii JV-V16
Lactobacillus jensenii 27-2-CHN
Lactobacillus jensenii 115-3-CHN
Lactobacillus iners AB-1
21
Lactobacillus iners DSM 13335
Core gene families
Gene X
A core gene is a gene strongly
conserved within a clade
Gene X is a core gene for
Clade Y
All subclades of Clade Y
must have Gene X as core
gene (strict definition)
Gene X may be a core
gene of several (unrelated)
clades
We have to relax the
definition for taking into
account:
• Low-level gene losses
• Sequencing errors
• Gene calls errors
22
Examples of core genes
23
Clade-specific marker genes
Gene X
Gene X is a marker gene
(for Clade Y) if X is a core
gene for Y and X never
appears outside Clade Y
24
Examples of marker genes
25
The BactoChip: high-throughput
microbial species identification
With Olivier Jousson, Annalisa Ballarini
26
BactoChip: detecting single species
With Olivier Jousson, Annalisa Ballarini
27
MetaPhlAn: inferring microbial abundances
from metagenomic data using marker genes
• Map metagenomic reads to marker genes to infer
microbial abundances
– Normalizing for copy number, gene length, etc.
Much faster than existing
approaches as the marker
gene database is ~50 times
smaller than the whole
microbial sequence DB
 Few hours instead of weeks for Illumina
samples with 100Gb of sequence data
MetaPhlAn: Metagenomic Phylogenetic Analysis
http://huttenhower.sph.harvard.edu/metaphlan
28
MetaPhlAn: synthetic validation on lognormal abundances
Summary of 8 synthetic communities composed by 2M reads coming from
200 organisms with log-normal distributed abundances concentrations
Species-level
Species level
Class-level
Class
level
29
Matching 16S and more
30
The human microbiome at
species-level resolution
31
Species
Genera
Whence enterotypes?
32
Microbial community function and structure in
the human microbiome: the story so far?
• Who’s there varies even in health
– What they’re doing doesn’t (as much)
– Both correlate with niche
– By the way: both change during disease and treatment
• There are patterns in this variation
– Function correlates with membership and phenotype
– “Pathogenicity” correlates with lower prevalence
– Membership means species, strains, or variants
– Patterns aren’t always as simple as enterotypes
• ~1/3 to 2/3 of human metagenome characterized
– Job security!
33
Ask both what you can do for your microbiome
and what your microbiome can do for you
Thanks!
Human Microbiome Project
Nicola Segata
Levi Waldron
Xochi Morgan
Dirk Gevers
Owen White
George Weinstock
Karen Nelson
Sahar Abubucker
Joe Petrosino
Yuzhen Ye
Mihai Pop
Beltran Rodriguez-Mueller
Pat Schloss
Jeremy Zucker
Makedonka Mitreva
Qiandong Zeng
Erica Sodergren
Mathangi Thiagarajan
Vivien Bonazzi
Brandi Cantarel
Jane Peterson
Maria Rivera
Lita Proctor
Barbara Methe
Bill Klimke
Daniel Haft
HMP Metabolic Reconstruction
Joseph Moon
Fah Sathira
Tim Tickle
Ramnik Xavier
Harry Sokol
Bruce Birren Mark Daly
Doyle Ward Eric Alm
Ashlee Earl Lisa Cosimi
Jacques Izard
Jeroen Raes
Karoline Faust
Vagheesh
Narasimhan
Josh Reyes
Olivier Jousson
Annalisa Ballarini
Wendy Garrett
Michelle Rooks
http://huttenhower.sph.harvard.edu
35
Linking function to community composition
← Taxa and correlated metabolic pathways →
← 52 posterior fornix microbiomes →
Plus ubiquitous pathways: transcription, translation,
cell wall, portions of central carbon metabolism…
Lactobacillus crispatus
Phosphate and peptide
transport
Lactobacillus jensenii
Sugar transport
Lactobacillus gasseri
Embden-Meyerhof glycolysis,
phosphotransferases
Lactobacillus iners
F-type ATPase, THF
Gardnerella/Atopobium
AA and small molecule
biosynthesis
Candida/Bifidobacterium
Eukaryotic pathways
37
Linking communities to host phenotype
Normalized relative abundance
Top correlates
with BMI in stool
Body Mass Index
Vaginal pH (posterior fornix)
Vaginal pH, community metabolism, and community
composition represent a strong, direct link between
phenotype and function in these data.
Vaginal pH (posterior fornix)
38