The effect of Next-Generation Sequencing technology on

Download Report

Transcript The effect of Next-Generation Sequencing technology on

THE EFFECT OF NEXT-GENERATION
SEQUENCING TECHNOLOGY ON
COMPLEX TRAIT RESEARCH
1
CHALLENGE ANALYSIS
Presented by Ladang Auxane
Mombaerts Laurent
Uyttendaele Vincent
December 10, 2013
University of Liège
GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov
TABLE OF CONTENTS
2
Introduction
Challenge analysis
1.
2.
1.
2.
3.
4.
5.
3.
4.
5.
Optimizing parameters for study design
Storing and handling data
Mapping and aligning
Variant calling
Analyzing low frequency and rare variants
Applications
Discussion
Conclusion
University of Liège
GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov
INTRODUCTION
3
 What is Next-Generation Sequencing
(NGS) ?


High throughput
Low-cost
 Applications
 From 1970 until now
University of Liège
F. Sanger
GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov
CHALLENGE ANALYSIS
4
1. Optimizing parameters for study design
 Three mains parameters:



High cost-to-data. Only parts of the
genome?
Power based on depth of coverage.
Sample selection.
University of Liège
GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov
CHALLENGE ANALYSIS
5
2. Storing and handling data
Two years ago
• The concept of
NGS was still
theoretical.
University of Liège
Today
• Devices are
operational and
affordable
→ raw data
available.
GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov
CHALLENGE ANALYSIS
6
Production
of raw data
Storage of
raw data
Handling
of raw data
• Using a fluorescent-dye DNA sequencer
• Labeling of DNA strands with 4 fluorescent dyes
• Separation of fragments by electrophoresis
• Monitoring by chromatography
• One run can provide until 4 Tb of data
• → Requirement of a huge memory capacity
• Algorithms will be applied for mapping
• → Requirement of powerful computing tools
University of Liège
GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov
CHALLENGE ANALYSIS
7
3. Mapping and aligning algorithms
De novo assembly
•Sequencing a genome without the use of
a reference genome.
•Reads are assembled by an overlapping
method.
Mapping
•Building a sequence that is similar to a
reference genome.
•Reads are aligned on the backbone.
University of Liège
GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov
CHALLENGE ANALYSIS
8
Improving speed and efficiency of algorithms to deal with large
throughput
Using a more accurate reference genome (including individual
sequences)
Detecting non-unique mapping (reads corresponding to different
sequences of the reference genome)
Taking into consideration different base qualities (degrees of certainty)
University of Liège
GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov
CHALLENGE ANALYSIS
9
4. Variant calling
Distinguish true variant from sequencing or mapping errors
→ Decrease the number of false positive SNP-calls
Detecting misalignment around indels
• Indel at the middle of a read :
Perfect match on either side
→ the algorithm opens a gap.
• Indel at one extremity of the read :
Hard recognition of the indel
→ misalignment of the read
→ false positive SNP-call
University of Liège
GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov
CHALLENGE ANALYSIS
10
Considering different error rates depending on the base location
• Nucleobases at the extremities have a higher error rate.
• If misalignment : false positive confident SNP call.
• SOLUTION : algorithms that consider a recalibrated “base quality score” and
select only the central portion of a read.
Decreasing the number of errors introduced by PCR artefacts
• PCR → not uniform cover of the reference genome
→ over-represented reads
• If misalignment : false positive confident SNP call.
• SOLUTION : paired-end sequencing libraries to discard clonal reads
University of Liège
GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov
CHALLENGE ANALYSIS
11
5. Analyzing low frequency and rare variants
May be a painful step !
Single-Point
• Low power
• Would require hundreds of thousands of individuals
Across sample sets (composite analysis)
• A bit less heavy in terms of computing time and data volume
University of Liège
GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov
APPLICATIONS
12
The number of scientific publications has exploded !
University of Liège
GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov
DISCUSSION
13
Development of new study design
 Development of more effective methods to distinguish
errors from low frequency & rare variants
 Development of the most appropriate
strategy to identify one disease.
University of Liège
GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov
DISCUSSION
14
Cost-benefit analysis
 Whole genome sequencing is unlikely to be
cost effective as it still presents huge
challenges.
 → coupling a reduction of the costs with an
increase of the efficiency and the accuracy.
 → make NGS platforms marketable,
competitive and usable for clinical
applications.
University of Liège
GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov
DISCUSSION
15
Validation analysis
 Standards for NGS clinical genomics
are required, for instance to validate
the test accuracy.
 → important downstream impact on
the patient diagnostic and
management.
University of Liège
GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov
DISCUSSION
16
Current knowledge and research
 Lack of knowledge
─ in what a SNP implies
─ in how we detect interaction between genes
─ in which influence gene expression has
─ in which interpretation must be given to the
genome variance …
The more we make tests,
the more knowledge we get,
the more associations between phenotype and genome we can do.
University of Liège
GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov
Conclusion
17
Multiple issues
• Study design
• Error handling
• Data interpretation
Enable a wide variety of applications
University of Liège
GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov
REFERENCES
18
A G Day-Williams, E Zeggini, The effect of Next-Generation Sequencing technology on complex
trait research, Eur J Clin Invest 2011, Vol 41 : 561-567.
 http://videos.rennes.inria.fr/genopole/GenOuest-2010/, [online on 7th December 2013]
 http://www.qiagen.com/products/applications/next-generation-sequencing/#Dataanalysis,
[online on 9th December 2013]

Figures
 http://www.labtimes.org/labtimes/method/methods/2010_01.lasso
 http://nextgenseek.com/2012/01/illumina-launches-a-new-faster-sequencer-hiseq-2500/
 http://www.nucleics.com/DNA_sequencing_support/DNA-sequencing-dye-blobs.html
 http://videos.rennes.inria.fr/genopole/GenOuest2010/peterlongo_plateforme_2010_10_26.pdf
 http://www.genomicslawreport.com/index.php/tag/illumina/
 http://www.cancer.gov/cancertopics/understandingcancer/geneticvariation/page40
University of Liège
GBIO0009-1 : Krystel Van Steen, Kyrill Bessonov