GS Junior-First Results

Download Report

Transcript GS Junior-First Results

www.454.com
GS Junior System – First Results
IMPORTANT NOTICE
Intended Use
Unless explicitly stated otherwise, all Roche Applied Science and 454 Life Sciences
products and services referenced in this presentation / document are intended for the
following use:
For Life Science Research Only.
Not for Use in Diagnostic Procedures.
www.454.com
Hemorrhagic Fever Virus Discovery in Native Host
http://www.ncbi.nlm.nih.gov/pubmed/21544192
www.454.com
Hemorrhagic Fever Virus Discovery in Native Host
• Darted Red Colobus monkey in the wild in Kibale National Park, Uganda
• Collected blood sample, isolated viral RNA/DNA
• Sequenced on GS Junior System
• Assembled using CLC genomics assembler, screened out host contigs
• Identified two novel SHFV (simian hemorrhagic fever virus) strains
• Generated near full-length viral sequences by filling in short gaps with PCR/Sanger sequencing and 3’RACE
• Significant findings:
– Not one, but TWO divergent SHFV viruses were present in one individual
– Red Colobus monkey is a native reservoir for these pathogenic viruses
– DNA was isolated from a healthy animal, demonstrating that these viruses can hide in apparently healthy
individuals
– Consequences for human contact, spreading viruses through research colonies
www.454.com
Plant Pathogen Sequencing
http://www.ncbi.nlm.nih.gov/pubmed/21131493
www.454.com
Plant Pathogen Sequencing
• Erwina amylovora, fire blight pathogen, isolated from blackberry in Illinois
• Commercial apple and pear blight, reported in 1790s
• 3.81 Mb genome, 53% GC, three circular plasmids
• Sequenced using 3/8 of GS FLX run and one GS Junior run (equal to four GS Junior runs)
• 31x coverage, 375 bp avg. read length
• Assembled by 454 GS De Novo Assembler into 29 contigs, gaps closed in silico using LaserGene
• Used GenDB to assign gene function for 3869 coding sequences
• Comparative genomics with related strains
www.454.com
Rare Variant Detection for HIV-1
Saliou et al. Antimicrob. Agents Chemother April 2011
www.454.com
Why Detect HIV Variants?
• HIV variants or “quasispecies” can use CCR5 and/or CXCR4 cell-surface receptors to enter cells
• Drugs that block CCR5 receptors work only if CXCR4-binding variants are absent
• As a result, there are tests to be sure that there are no CXCR4 binding viral variants before administering this class of
HIV drugs to an individual
www.454.com
Why use 454 Sequencing System?
Potential to deliver speed, ease of use, cost savings
• Current high sensitivity assays can detect viral variants at 0.3%, but are slow, expensive and difficult
• Current Sanger sequencing assays are rapid, cheap but cannot detect quasi-species below 10-20%
• Sensitivity at 0.3% can best predict treatment outcomes
• 454 Sequencing Systems can deliver sequencing specificity for ~25 samples in one GS Junior run
www.454.com
Experimental Design
• 415 base cDNA amplicon covering V3 env. region of HIV-1
• Nested RT-PCR to generate amplicons with MIDs
• 23 individual samples  obtained ~3,500 reads/sample, sequenced in one GS Junior run
• GS AVA software used to align to reference
• Processed the reads using third party prediction software
• Detected quasispecies to 0.6% reliably
• Calculated mean error rate of .000853 for pyrosequencing from control plasmids!
www.454.com
Results
Summary
- 84,000 reads
- 23 samples
- 0.6% detection limit
Critical Factors
- 415 bp amplicon
- 1600 or more reads per
sample
www.454.com
Detection limited by software that predicts phenotype
First Publication using GS Junior System Data
www.454.com
Summary of Results
• Sequencing of MHC class I transcripts in macaques to discover all expressed transcripts from common class I
haplotypes
• Sequenced 3 amplicons from ~440 to 620 bases
• Combination experiment
– 7 individuals on GS FLX System, 3 using GS Junior System
– Identified all sequences found previously
– Discovered 2x more haplotypes than with previous Sanger-based approach
• 440-600 base amplicons allow resolution of haplotypes that are impossible with 190 base amplicons
www.454.com
GS Junior System
Primary applications
• de novo sequencing
– sequencing of whole microbial, viral and other small genomes
• Targeted sequencing
– Using sequence capture, PCR, amplicons, transcriptome cDNA sequencing
– Genotyping, rare variant detection, somatic mutation detection, disease associated genes, genomic regions
• Metagenomics
– characterization of complex environmental samples (16s rRNA and shotgun)
www.454.com
Whole Genome Shotgun Sequencing
Sequencing of three representative bacterial
genomes
System
GS FLX
Organism
Genome Size
(in Kb)
GS Junior
GS FLX
GS Junior
GS FLX
GS Junior
E. coli K-12
T. thermophilus
C. jejuni
4563
2120
1600
Avg. Contig
Size (in Kb)
39
58
44
53
49
46
N50 ContigSize
(in Kb)
84
112
112
121
115
95
Largest Contig
Size (in Kb)
209
352
474
578
304
173
Number Of
Contigs
115
78
48
40
33
35
de novo Assemblies at 25x coverage using GS Junior and GS FLX Titanium reads
www.454.com
Data from GS Junior System Shotgun Runs
Variety of different microbes, early access site data
Run
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Average
www.454.com
Passed Filter
Reads
117,636
83,045
90,415
128,225
43,321
66,100
100,335
79,145
109,894
108,779
94,605
61,975
99,273
115,776
115,972
115,031
95,595
Avg Length
Total Bases
445.1
323.6
386.7
350.6
353.2
367.2
433.4
394.8
422.6
437.8
457.4
398.7
384.2
429.5
419.3
414.4
401
52,350,254
26,867,086
34,954,101
44,939,653
15,297,828
24,265,407
43,475,724
31,242,875
46,430,503
47,613,708
43,271,233
24,706,557
38,134,165
49,716,849
48,622,874
47,661,170
38,721,874
3kb paired end- 1M base genome, 1 run, one scaffold
Read Length
• One GS Junior System run produces reads from 50-600 or more in length
• Average is in 330-400 base range
Number of reads
• Most reads are in the 450-550 base range
Readlength (bases)
www.454.com
CFTR Exon Resequencing on GS Junior System
Numbers of reads per amplicon
(across 11 samples)
600
500
# of Reads
Experimental design:
• 11 Coriell samples with known mutations in CF gene
• Each sample was MID-labeled (11 MIDs)
• Amplified all 27 coding exons with 34 amplicons
• Mixed 11x34 = 374 amplicons
• Sequenced in 1 GS Junior System run
• Average coverage 182x
• 96% of the reads mapped back to the CF gene region
400
300
200
100
0
0
50
100
150
200
250
300
350
400
374 Individual Amplicons
Coverage graph: range 27-551x
Since multiplex PCR reactions could not be normalized,
PCR efficiency dictated the coverage levels for each
amplicon
www.454.com
CFTR Variant Detection by GS Junior System
Heterozygous
• AVA output – showing 5 of 11 samples vs. variants discovered
ΔF508: known, phenotype-associated CFTR mutation
www.454.com
Sizes of actual amplicons
GS Junior and GS FLX reads are equivalent
CFTR Variant Detection
ΔF508
R668C
known, phenotype-associated CFTR mutation
Synonymous
same mutation detected in two separate, overlapping, amplicons
www.454.com
GS Junior Haplotyping of HLA Loci
• Read length and clonality critical for resolution of individual haplotypes- sequencing covers multiple alleles in each
clonal read!
• The longer the read, the better haplotype discrimination– below 200 bases=very poor
– 200-300=poor
– 300-500=good
– 500-800=excellent
Allele 1
www.454.com
Allele 2
Studying SIV using GS Junior System
• Ben Burwitz in Dave O’Connor’s lab, Univ. of Wisconsin
• Follow changes in GAG gene as virus evolves to evade immune response
• Find genome-wide mutations in viral pool
Rhesus
macaque
www.454.com
Simian
Immunodeficiency
Virus
Amplicon Sequencing- Basic Amplicon
454 amplicon design using tailed primers
A
454 Titanium A-primer (21 bp)
key
MID
Sequence of
interest
Locus-specific PCR
amplification
200-600 bp
MID
key
B
454 Titanium B-primer (21 bp)
emPCR Amplification and sequencing
• Long reads required to sequence through the locus specific primer, enable haplotyping over longer distances
• 100s to 1000s of amplicon clones sequenced simultaneously
www.454.com
Amplicon Sequencing- Long Range Amplicons
Using long range amplicons for whole viral or other genomic
region sequencing
Locus-specific long range PCR amplification
1,500-15,000 or more bp
Sequence of
interest
Shear to 400-600 bases using gDNA protocol
Ligate sheared amplicon into 454 primers using gDNA protocol
454 Titanium A-primer (21 bp)
key
MID
A
B
A
B
A
MID
key
454 Titanium B-primer (21 bp)
www.454.com
emPCR Amplification and sequencing
A
B
B
SIV Genome Sequencing
SIV Proteome
SIV Genome
(Viral RNA)
Direct Amplicon
Full Genome
www.454.com
* Slide courtesy of U Wisconsin
0bp
10535bp
SIV Genome Sequencing – Direct Amplicon
# of Samples - 28
Total Reads - 82,079
Median Length - 356bp
www.454.com
* Slide courtesy of U Wisconsin
Number of Reads
354bp
Read Length (bp)
Viral Mutations in the Structural SIV Protein
Gag evolve to escape immune response
Mutations in the SIV protein Gag affect viral
fitness- Gag protein is the ‘particle making
machine’
www.454.com
* Slide courtesy of U Wisconsin
Viral Mutations in the Structural SIV Protein
Gag evolve to escape immune response
Mutations in the SIV protein Gag affect viral
fitness- Gag protein is the ‘particle making
machine’
www.454.com
* Slide courtesy of U Wisconsin
SIV Genome Sequencing
SIV Proteome
SIV Genome
(Viral RNA)
Direct Amplicon
Full Genome
www.454.com
* Slide courtesy of U Wisconsin
0bp
10535bp
SIV Genome Sequencing - Amplicons
~2kb
~2kb
Total Reads - 59,097
Median Length - 321bp
Number of Reads
~2kb
www.454.com
* Slide courtesy of U Wisconsin
Read Length (bp)
~2kb
Number of Reads
SIV Full Genome Sequencing Coverage
www.454.com
* Slide courtesy of U Wisconsin
SIV Genome - Base Pair Position
454 Sequencing System vs. Sanger
Animal 1
www.454.com
* Slide courtesy of U Wisconsin
Animal 2
Animal 3
Ben’s Conclusions
• GS Junior System detects low frequency genetic variants
that are missed by traditional Sanger sequencing
• A bench-top GS Junior System improves turn around time
and can be readily adapted to small academic lab settings
Acknowledgements
www.454.com
O’Connor Lab
Ben Burwitz
Roger Wiseman
Shelby O’Connor
Dawn Dudley
Julie Karl
Simon Lank
Charlie Burns
Ericka Becker
Ben Bimber
Dave O’Connor
Watkins Lab
Jonah Sacha
Matt Reynolds
Nick Maness
Nancy Wilson
David Watkins
Inherited Disease
• Looking for rare mutations in affected individuals
• Target gene from GWAS study
• Two PCR approaches- long range PCR and short amplicon
• MID sequences used to distinguish individuals in a pool
Target Gene
1
MID 1
MID 2
MID 3
www.454.com
2
3
4
5
6
7
8
9
10
11
12 13
14
Long Range Amplicon Sequencing Results
Shotgun processing
www.454.com
Run
Reads
Average Read
Length (bases)
Total Bases
# of Sample
Sequenced *
1
96,947
385
37,363,295
8
2
134,252
389
52,263,214
9
3
149,809
417
62,540,439
10
4
143,498
417
59,930,800
10
5
151,370
394
59,732,290
8
Small Amplicon Sequencing Results
Amplicon Processing
Run
Reads
Average Read
Length (bases)
Total Bases
# of Sample
Sequenced
1
72,191
322
23,289,440
11
2
75,424
313
23,664,312
12
3
84,441
325
27,443,160
12
4
101,395
339
34,394,604
12
5
60,243
435
26,248,268
12
6
25,884
374
9,690,154
12
7
70,406
424
29,905,454
12
8
71,587
434
31,064,908
11
www.454.com
Amplicon Coverage- Accurate Pooling Required!
Amplicons
Individual Samples
Poorly Pooled Amplicon
www.454.com
Poor performing Sample
Sampling Variability
Poor Performing Amplicon
Verification of Novel Mutations
Sample ID
ASP Result
GS Junior
Agreement
1
Heterozygous
50.94% / 106
Y
2
Heterozygous
52.5% / 200
Y
Allele-Specific
PCR:
Heterozygous of
39.33%
178the alleles
Y
Selective3PCR amplification
one / of
to detect
4
Homozygous
94% (SNP).
/ 100
Y
Single Nucleotide
Polymorphism
5
Heterozygous
48% / 125
Y
6
Heterozygous
47.06% / 221
Y
9
Heterozygous
46.07% / 191
Y
10
Heterozygous
54.17% / 24
Y
11
Homozygous
97.57% / 288
Y
12
Selective amplification is usually achieved by designing a
7
Homozygous
99.18% / 243
Y
primer such that the primer will match/mismatch one of the
8
Heterozygous
46.71% / 167
Y
alleles at the 3'-end of the primer.
Wild-Type Primer Set
www.454.com
Assay Primer Set
Genotype
Heterozygous
42.33% / 163
Y
Sample 1
Amplified
13
Heterozygous
Not Amplified
41.88% / 191
Y
Wild Type
Sample 2
14
Amplified
Heterozygous
47.02%
/ 151
Amplified
Y
Heterozygous
Sample 3
Not Amplified
15
Heterozygous
48.07% / 441
Y
16
Heterozygous
17.86% / 252
N
17
Heterozygous
50.32% / 157
Y
18
Heterozygous
16.18% / 272
Y
19
Heterozygous
14.85% 330
Y
Amplified
Homozygous
Pathogen Discovery on the GS Junior System
• Case from Sandton, South Africa
• Infected paramedic during transfer, nurse at hospital, cleaning staff, and nurse of
paramedic- 4/5 did not survive
Serum and tissue samples from victims were subjected to unbiased pyrosequencing, yielding within 72 hours of sample
receipt, multiple discrete sequence fragments that represented approximately 50% of a prototypic arenavirus
genome.
• Recapitulated GS FLX System study in single GS Junior System run
• 250 Hits to LuJo Virus covering 57% of the L-segment and 79% of the S-segment
www.454.com
Coming Soon
• GS Junior System Publications in
– Metagenomic characterization of human environments
– Whole Genome Sequencing of bacterial pathogens
– Rare variant discovery in human disease- GWAS follow up experiments
– Viral pathogen sequencing
– Many more!
www.454.com
GS Junior System First Results
Disclaimer & Trademarks
Disclaimer:
For life science research only. Not for use in diagnostic procedures.
Trademarks:
454, 454 LIFE SCIENCES, 454 SEQUENCING, EMPCR, GS FLX, GS FLX TITANIUM, GS
JUNIOR and SEQCAP are trademarks of Roche.
Other brands or product names are trademarks of their respective holders.
www.454.com
www.454.com