The Whole Genome Sequencing Revolution

Download Report

Transcript The Whole Genome Sequencing Revolution

The Whole Genome Sequencing
Revolution
Martin Wiedmann
Gellert Family Professor of Food Safety
Department of Food Science
Cornell University, Ithaca, NY
E-mail: [email protected]
Phone: 607-254-2838
Outline
• Subtyping for disease surveillance: from PFGE to WGS
• WGS challenges: when are two isolates the same or
different? Can we find identical isolates in different
locations?
• Looking in the future
PulseNet allows
international outbreak
detection and traceback –
a hypothetical example
Food isolate, deposited into
PulseNet
Human case
Human case
Whole Genome Sequencing
• It all started with the human genome project
• Sequencing of a bacterial genome is now
feasible at costs of <$100/isolate
• Costs will continue to drop
• Commonly used platforms include
• Roche 454
• Illumina HiSeq/MiSeq
• Applied Biosystems SOLiD Systems
• Life Technologies/Thermofisher Ion
Torrent;
• PacBio RS
• Nanopore based systems (e.g., Oxford
Nanopore MinION)
The genome sequence revolution
DNA sequencingbased subtyping
1
3
2
4
Isolate
Isolate
Isolate
Isolate
1
2
3
4
AACATGCAGACTGACGATTCGACGTAGGCTAGACGTTGACTG
AACATGCAGACTGACGATTCGTCGTAGGCTAGACGTTGACTG
AACATGCAGACTGACGATTCGACGTAGGCTAGACGTTGACTG
AACATGCATACTGACGATTCGTCGAAGGCTAGACGTTGACTG
SNP: single nucleotide polymorphism
Challenges with use of PFGE as a
subtyping method in outbreak
investigations
• Two isolates may show the same PFGE type even
though they are genetically distinct
• PFGE only interrogates small part of the genome
• Two isolates may show “slightly” (?? - the “3-band
rule”) different PFGE patterns despite sharing a very
recent common ancestor
• Could be due to lateral genes transfer, loss of
plasmid, rearrangements, point mutations etc.
Xbal
SpeI
Includes isolates form
Salmonella outbreak
linked to sausages
(Rhode Island) and
isolates from pistachios
L
Den Bakker
et al. 2011.
AEM.
Tip-dated maximum clade credibility tree
based on SNP data for 47 Montevideo
isolates
• Salmonella Enteritidis is most common cause of human
salmonellosis
– poorly resolved by current subtyping technologies.
PFGE type frequency
52 PFGE types
4
34
2
21
5
8
19
692
56
23
327
88
231
899
879
199
MLVA type frequency
98 MLVA types
B
G
BQ
F
J
W
I
D
AI
BN
AC
E
AG
V
AB
AF
BD
MLVA-PFGE type frequency
B4
B34
G4
B21
BQ8
I5
W4
J4
D4
BN692
AI19
AC2
F2
V4
AG56
J21
163 combined
MLVA-PFGE types
Full genome sequencing identified the
following differences between these
isolates:
(i) 28 single nucleotide polymorphisms
(SNPs) and
(ii) three indels, including a 33 kbp
prophage that accounted for the
observed difference in AscI PFGE
patterns.
Both isolates were found to harbor a 50 kbp
putative mobile genomic island encoding
translocation and efflux functions that has
not been observed in other Listeria
genomes.
Gilmour et al. BMC Genomics 2010, 11:120
In addition, whole genome sequencing showed that 5 Listeria isolates collected in 2010 from
the same facility were also closely related genetically to isolates from ill people.
Listeria Outbreaks and Incidence, 1983-2014
Incidence
(per million pop)
No. outbreaks
8
Outbreak
9
7
Incidence
8
7
6
6
5
5
4
4
3
3
2
2
1
1
0
0
1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013
Era
Outbreaks per year
Median cases per
outbreak
Pre-PulseNet
0.3
69
Data are preliminary and subject to change
Early
PulseNet
2.3
11
Listeria
Initiative
2.9
5.5
WGS
8
4.5
March 2015: Listeriosis cases linked to Blue Bell ice
cream
Outline
• Subtyping for disease surveillance: from PFGE to WGS
• WGS challenges: when are two isolates the same or
different? Can we find identical isolates in different
locations?
• Looking in the future
The challenge
• Identical bacteria (100% match over the whole
genome) can be found in different places that
can be potential sources of foodborne disease
outbreaks
The theoretical background
• Bacteria divide asexually: Bacterial populations can be seen as large
populations of “identical twins”
• Mutation rate during replication is low: extremes of the suggested
mutation rates range from 2.25 × 10-11 to 4.50 × 10-10 per bp per
generation
– With a genome size of around 5 Million bp per bacterial genome (5 × 106)
between approx. 450 and 9,000 generations are needed for a single SNP difference
– Eyre et al. estimated evolutionary rate of 0.74 SNVs per successfully sequenced
genome per year for C. difficile (N. Engl. J. Med. 2013)
• “Whole-genome sequencing … identified 13% of cases that were genetically
related (≤2 SNVs) but without any evidence of plausible previous contact
through a hospital, residential area, or family doctor.”
– Unknown bacterial generation time in different environments complicates
interpretation
2000 US outbreak - Environmental
persistence of L. monocytogenes
• 1988: one human listeriosis case linked to hot dogs produced by plant X
• 2000: 29 human listeriosis cases linked to sliced turkey meats from plant X
Real world observations
Real world observations
In one case, isolates with < 3 SNP differences were found in
retail delis in there different states
Conclusions
• Even with WGS, epidemiological data are still essential
• Number of SNP differences/allele differences that is meaningful
differs by organism, strain, outbreak/cluster, and growth
environment
– Number of bacterial generations per calendar year can differ
hugely (think dry environment versus active infection in an
animal population)
• Best way to determine “meaningful” SNP differences is through
combination of phylogenetic and epidemiological data
Looking in the future
• WGS will get cheaper and will be used more
– STEC next, probably Salmonella Enteritidis after that
– Detection of more clusters and outbreaks
• WGS database will grow rapidly with inclusion of environmental
isolates
– More outbreak will be linked to source by using WGS matches
between food or environmental isolates and human isolates
as stating point
• More broad application of WGS by private labs, maybe
customers and consumers?
Conclusions
• WGS is a game changer and will significantly improve
detection of outbreaks, adulteration, etc.
– False alarms will occur though
• Pathogen detection in environments, by regulatory
agencies, will lead to inclusion of WGS data in
CDC/FDA/USDA databases (GenomeTrakr)
– Environmental pathogen monitoring by industry will
become even more important
30
Analysis of genome wide SNPs (wgSNPs)
• Identifies all high confidence SNPs over whole
genome (approx. 3 to 5 million nucleotides)
Whole genome multilocus sequence typing
(MLST)
• Allows for simpler analysis and clear naming of
subtypes
• Performs comparison on a gene by gene level
Isolate A
Isolate B
Isolate C
Gene 1
1
1
1
Gene 2
8
8
12
Gene 3
5
5
2
Gene 1,005
4
4
4
wgMLST type
A
A
B
Etc.