PPTX - Tandy Warnow

Download Report

Transcript PPTX - Tandy Warnow

TIPP: Taxonomic Identification
And Phylogenetic Profiling
Nam-phuong Nguyen
Computer Science And Engineering
University Of California, San Diego
Precision Medicine
•
Personalized treatment based upon the
patients’ phenotypes and genotypes
•
Precision Medicine Initiative launched with
$215M in 2016
•
Many different aspects including genomics,
epigenetics, microbiome
Image courtesy of gurdanhealth.com
Precision Medicine
•
Personalized treatment based upon the
patients’ phenotypes and genotypes
•
Precision Medicine Initiative launched with
$215M in 2016
•
Many different aspects including genomics,
epigenetics, microbiome
Image courtesy of gurdanhealth.com
Human Microbiome
• 10 times more bacteria cells than
human cells
• Important role in regulating health
• Disruption associated with risk factors
for diseases
• Analysis through metagenomics
Image courtesy of humanlongevity.com
Metagenomics
• Analyzing DNA sequences from
environmental sample
• Typical datasets contain millions of
reads
Fundamental Questions
• What is the identity of a read?
• What is the microbial profile of a sample?
• What genes/functions are present?
Fundamental Questions
• What is the identity of a read?
• What is the microbial profile of a sample?
• What genes/functions are present?
Metagenomic Taxon Identification
Objective: classify short reads in a metagenomic sample
Abundance Profiling
Objective: distribution of the species (or genera, or
families, etc.) within the sample
For example, the distribution of a sample at the species
level might be:
Species A: 10%
Species B: 25%
Species C: 55%
Species D: 1%
Species E: 9%
Genome-based profiling
A
A
B
Population of 2 bacteria, A and B.
B has twice as large genome as A.
True profile: 67% A, 33% B
Profile estimated from reads: 50% A,
50%B
Single copy marker-based profiling
A
A
Population of 2 bacteria, A and B.
B has twice as large genome as A.
Each have a single copy of gene C
B
True profile: 67% A, 33% B
Profile estimated from reads: 67% A,
33%B
TIPP: Taxonomic Identification And
Phylogenetic Profiling
Fragmentary unknown reads
for a gene
Known full length sequences for a gene,
and an alignment and a tree
ACCG
CGAG
CGG
GGCT
…
…
…
…
ACCT
AGG...GCAT
(species1)
TAGC...CCA
(species2)
TAGA...CTT
(species3)
AGC...ACA
(species4)
ACT..TAGAA
(species5)
TIPP: Taxonomic Identification And
Phylogenetic Profiling
• Nguyen et al., Bioinformatics, 2014
Reads
Assign to
marker genes
Marker
genes
Classify
reads
Compute
profile
Abundance Profiling
• Objective: distribution of the species (or genera, or families, etc.) within the
sample.
• Leading techniques:
• PhymmBL (Brady & Salzberg, Nature Methods 2009)
• NBC (Rosen, Reichenberger, and Rosenfeld, Bioinformatics 2011)
• MetaPhyler (Liu et al., BMC Genomics 2011), from the Pop Lab at the University
of Maryland
• MetaPHlAn (Segata et al., Nature Methods 2012), from the Huttenhower Lab
at Harvard
• mOTU (Bork et al., Nature Methods 2013)
• MetaPhyler, MetaPHlAn, and mOTU are marker-based techniques (but use
different marker genes).
“Hard” genome datasets (known
genomes and high indel error)
Note: NBC, MetaPhlAn, and MetaPhyler cannot classify any sequences from at
least of the high indel long sequence datasets. mOTU terminates with an error
message on all the high indel datasets.
“Novel” genome datasets
Note: mOTU terminates with an error message on the
long fragment datasets and high indel datasets.
TIPP Compared To Other Profiling
Methods
• TIPP is highly accurate, even in the presence of novel
genomes and high sequencing error
• All other methods are less robust
• Accurate profiles can be estimated using only a portion of
the reads
Do Individual Primates From The Same
Species Have Personal Microbiomes?
Humans have personalized microbiome
Fierer et al., PNAS 2010 showed that you can identify who had previously used a
keyboard via the residual contact microbiome (three individuals in study)
Experimental Design
• Dataset (unpublished; in preparation)
• Data collected by Patton’s Lab at U of Washington
• Longitudinal study of the vaginal, rectal, and fecal microbiome in
39 female captive Pigtailed Macacas
• Weekly matched paired samples taken over a period of a month
from each individual
• 16S rRNA amplicon sequencing
• TIPP (Nguyen et al. 2014) used to generate profiles
• Questions
• How to the microbiomes differ by body site and individual
• Can we identify an individual based upon the microbiome?
Experimental Design
Week 1
Week 2
Week 3
Which individual?
Identification Results
Future Directions
• Expanding the marker set, both in the number of species and genes
• Statistical approach to combining profiles from different marker genes
• Developing TIPP for virobiome
Acknowledgements
• Illinois
• Tandy Warnow
• Rebecca Stumpf
• Bryan White
• Mike Nute
• Brenda Wilson
• UCSD
• Siavash Mirarab
• UMD
• Mihai Pop
• Bo Liu
• U of Copenhagen
• Alonzo Alfaro-Núñez
• Tom Hansen
• Anders Hansen
• Funding
• NSF 09-35347
• NSF 08-20709
• NSF 0733029
• University of Alberta
Questions?
• TIPP tutorial tomorrow at 10:00-11:00 in MR7
• Instructions for downloading at
https://github.com/smirarab/sepp/blob/master/README.TIPP.md
• Tutorial at
https://github.com/smirarab/sepp/blob/master/tutorial/tipptutorial.md