presentation - Joachim De Schrijver
Download
Report
Transcript presentation - Joachim De Schrijver
Sequencing technology
› Roche/454 GS-FLX (‘454’)
› Illumina
Prokaryotic profiling
›
›
›
›
De novo genome sequencing
Metagenomics
SNP profiling
Species quantification
Viral profiling
› De novo genome sequencing
Classic chain-terminator sequencing
Dye chain-terminator sequencing
Next-generation sequencing
Next-gen sequencing principle
› Massive parallel
› Add ACTGs
› Catch a signal
Roche/454 GS-FLX+ (‘454’)
› Pyrosequencing
problems with homopolymers (e.g. AAAAAA)
› Long-read sequencing: 500-1000 bp
› Variable sequencing length
› 1 million reads/run
1Gb/run
› Sequencing speed: ~ 1 day/run
› Next-next generation: IonTorrent PGM/Proton
Illumina
› Sequence by synthesis
› Short-read sequencing: 36, 72, …, 150bp
› Fixed sequencing length
› 1 billion reads/run
100Gb/run (= 33 x human genome!)
Sequencing speed: 3 day – 10 days ~ length
Solid
› Short-read sequencing (similar to Illumina)
454
Illumina
Price per run: $10000/run
Price per machine: $200-500.000
› Supporting IT hardware
› Peripheral devices such as fragmentation
instrument, PCR equipment …
› Negotiating power…
Use service centers!
› Nxtgnt (BE), GATC(EU), Baseclear(NL), BGI …
› No overhead cost, no maintenance etc.
› Cheaper
Next-generation sequencing has
become 2nd generation sequencing
Next-next-generation sequencing is
almost there: 3rd generation sequencing
› Helicos: True Single Molecule Sequencing
› IonTorrent/Life: Cheap and fast
› Nanopore: Unlimited read size
› …
Evolution sequencing technology goes
hand in hand with evolution of
› IT infrastructure/hardware
› Analysis software
Hardware
› 1 Illumina run ~ 100Gb text-file ~ 5million page
book
› Processing power/storage are an issue!
Software
› Mapping to a human genome: ‘couple of hours’
Sequencing technology
› Roche/454 GS-FLX (‘454’)
› Illumina
Prokaryotic profiling
›
›
›
›
De novo genome sequencing
Metagenomics
SNP profiling
Species quantification
Viral profiling
› De novo genome sequencing
Prokaryotic genomics 101
› Prokaryotes = bacterias + archaea
› Prokaryotic genomes
Large circular genome (0.5 – 10 Mb)
‘chromosome’
Small plasmids (1-1000 kb) (virulence factors,
antibiotics resistance …)
(Almost) no introns
Easy ORF annotation
Sequencing technology
› Roche/454 GS-FLX (‘454’)
› Illumina
Prokaryotic profiling
›
›
›
›
De novo genome sequencing
Metagenomics
SNP profiling
Species quantification
Viral profiling
› De novo genome sequencing
1953: Watson/Crick discover DNA helix
1977: First complete genome
bacteriophage φX174
1995: First genome of free-living organism
H. influenza
2001: First draft of the human genome
2006: >200 complete bacterial genomes
2012: An uncountable number of bacterial
genomes have been sequenced using
next-gen sequencing
Complete bacterial genomes used to be
› Expensive
› Difficult to obtain
› ‘Nature’ or ‘Science’ work
› Remained complex until the invention of
next-generation sequencing
Using next-generation sequencing, de
novo sequencing has become
› Relatively easy
› Relatively cheap
› Routine research
Already >10 complete bacterial
genomes published in 2012
› More than just an assembly!
Practical
1. Get some DNA from an isolated species of
interest
2. Sequence: long or short reads (1-10 days)
3. Obtain your sequences
4. Assemble (1h)
Pure de novo assembly
Guided assembly
5. Annotate the genome (days-weeks)
Assembly:
Multiple ‘short’ reads
1 long sequence
Existing software
› Velvet
› SSAKE
› Newbler
› SSAKE
› …
Source: Nature 2009, MacLean et al.
Relatively cheap
› Sequencing cost: depending on coverage
Illumina, 30x, 5Gb genome: $10-$100
454, 30x, 5Gb genome: $1000-$5000
› Equipment
IT infrastructure, sequencing equipment, people …
Relatively easy
› Need for IT support
› No out-of-the-box standard solution for
everything
› Several different software packages for
assembly
Sequencing technology
› Roche/454 GS-FLX (‘454’)
› Illumina
Prokaryotic profiling
›
›
›
›
De novo genome sequencing
Metagenomics
SNP profiling
Species quantification
Viral profiling
› De novo genome sequencing
De novo genome assembly
› Study of 1 single species
› Need for species isolation
Metagenomics analysis
› Study of a community of species
› No need for isolation (culturing bias!)
› Study the collective gene pool and function
of the community/ecology
› No need for individual functions
Practical
1. Get bacterial DNA or RNA from a sample
Soil
Gut/Fecal
Ocean water (e.g. Craig Venter)
…
2. Sequence: long or short reads (1-10 days)
3. Obtain your sequences
4. Map on a database of known genes (1 day)
5. Annotate/analyse the community (weeks)
2010: Giant Panda genome (2nd carnivore)
› No umami taster receptor -> no meat affinity
› The panda is more a dog than a bear
› The panda is a carnivore eating bamboo!
Still 2010 !: Panda ‘microbiome’
Gut microbiome of the panda reveals
the presence of bamboo/cellulose
degrading pathways
A clinical example: gut microbiome can predict
diabetes and malnourishment
Plos One (2011), Brown et al. Plos One (2010),
GutValladares
Pathology et
(2011),Gupta
al.
et al.
Sequencing technology
› Roche/454 GS-FLX (‘454’)
› Illumina
Prokaryotic profiling
›
›
›
›
De novo genome sequencing
Metagenomics
SNP profiling
Species quantification
Viral profiling
› De novo genome sequencing
Classical SNP analysis - practical
1. Design PCR primers
2. Generate amplicons
3. Re-sequence using long read sequencing
Conserve ‘SNP blocks’
4. Detect SNPs
5. Correlate SNPs to drug resistance, severity
of symptoms …
Amplicon resequencing is the same for
human, prokaryotic, viral analyses
Many standardized out-of-the-box
solutions available
Very simple analysis
Watch out for the overkill…
›
›
Don’t use a bazooka to kill a fly!
Throughput can be too high
Profile the coding region of hepatitis C
Lauck et al. 2012
Use next-generation sequencing to
predict the optimal HIV therapy
Thielen et al. 2012
Sequencing technology
› Roche/454 GS-FLX (‘454’)
› Illumina
Prokaryotic profiling
›
›
›
›
De novo genome sequencing
Metagenomics
SNP profiling
Species quantification
Viral profiling
› De novo genome sequencing
Imagine the following research questions
› Which (known) species/groups are present in
a certain sample
› Does this composition alter given a certain
treatment, change of conditions, patients
etc.
No need for de novo genome
sequencing
No metagenomics: species instead of
functions
Prokaryotes have the gene 16S rDNA,
coding for ribosomal RNA
The 16S rDNA region is 1.5 kb long
16S rDNA is specific for each
species/strain
1,500
903
Theoretical: 4
= 10 possibilities
In practice: 16S rDNA sequence known
for millions of species
16S rDNA can be isolated in different
species using universal PCR primers
› Isolate/amplify different regions using the
same primers
Compare the isolated sequences
against a database of known sequences
Practical procedure
1. Sample an environment and isolate DNA
2. Do a universal PCR amplification
3. Sequence using long read sequencing: the
longer the better!
4. Obtain sequences
5. Map sequences against a reference
database
6. Annotate the data
Example: The Antarctica project
› Which parameters determine the
›
›
›
›
composition of bacterial communities in
antarctical lakes?
20 different samples/lakes
Sequence 16S rDNA genes
1 x 454 run (1 million 500bp sequences)
Map all sequences back to the RDP
database
Analyse the data using computing
power
› Compare different locations
Is species A present in location1, location2,…
› Assess the distribution in a single location
How dominant is the most dominant species in
location 1
How many species are in location 1
…
Visualize !
Analyse different samples on different
taxonomic levels
› Include taxonomic tree of life of bacterias
› Use a ‘taxonomy browser’
Analyse a single location
Compare different locations
Analysis
Lab work difficulty
Analysis difficulty
De novo genome
++ (isolate)
+
Metagenomics
+
+++ (pathways etc.)
SNP
+++ (design primers)
++ (correlate)
Species quantification
++ (universal primers) ++
Sequencing technology
› Roche/454 GS-FLX (‘454’)
› Illumina
Prokaryotic profiling
›
›
›
›
De novo genome sequencing
Metagenomics
SNP profiling
Species quantification
Viral profiling
› De novo genome sequencing
Viral profiling
› Viral profiling = prokaryotic profiling, but…
Cheaper
Faster
Easier
› De novo genome sequencing = OK
› Don’t spend $10.000 on a 100kb genome!
› Multiplexing/pooling capacity is limited!
Watch out for the overkill
› An illumina run can be split into 8 lanes
› >20 samples per lane can be combined
Still >100Mb per sample…
Thanks for your attention !
[email protected]