Chapter 4 part II

Download Report

Transcript Chapter 4 part II

Chemical Synthesis,
Amplification, and Sequencing
of DNA (Part II)
DNA Sequencing
 The function of a gene can often be deduced from its




nucleotide sequence.
A presumptive amino acid sequence, determined from
the nucleotide sequence, can be compared with protein
from known genes. A significant similarity indicates a
protein with an equivalent function.
DNA binding sites, receptor recognition sites, and
transmembrane domains can be ascertained.
The non-coding regions may provide information about
the regulation of a gene.
The sequence information is essential for molecular
cloning studies and characterizing gene activity.
Dideoxynucleotide Procedure
 DNA sequencing techniques uses of modified
nucleosides (ddNTPs) with flourescent “tags”
 ddNTPs lack a hydroxyl group at the 3’ position, however, so
no new nucleotide can be added
Normal DNA synthesis
 An incoming dNTP base
pairs with the
complementary nucleotide
of the template strand.
 The internucleotide linkage
occurs between the 3’
hydroxyl group of the last
nucleotide of the growing
strand and the α-phosphate
group of the incoming
nucleotide.
Blocked DNA synthesis
 Chain growth is stopped by
the addition of a
dideoxynucleotide to the end
of the growing strand.
 The internucleotide linkage
between the last nucleotide,
which is ddNTP, and the
next incoming nucleotide
cannot be formed because
there is no 3’ OH group on
the dideoxynucleotide sugar.
Primer extension
Simulated
autoradiograph
 Each lane of the gel was
loaded with the contents
of one of the four reaction
tubes.
 By convention, the bands
of the autograph are read
from the bottom to the
top.
DNA Sequencing
 Single-stranded DNA is
mixed with DNA polymerase,
short primer strands, the four
normal dNTPs, and small
amounts of fluorescently
“tagged” ddNTPs
 When a ddNTP is
encountered, growth stops
 Result is a solution containing
different lengths of
polynucleotides ending with
tagged ddNTPs at end
DNA Sequencing
 Electrophoresis
separates strands by
length
 Color of fluorescent
tag indicates type of
ddNTP at end of the
strand
 By checking end color
of successive strand
lengths, the sequence
is revealed
Electrophoresis Gel Images from an
Automated DNA Sequencer
 The emission data are recorded and stored in computer
and converted to a nucleotide sequence information.
DNA Sequencing
 To produce large amounts of dideoxynucleotide-




terminated fragments for small amounts of template DNA,
PCR-based cycle sequencing is commonly used.
The setup and components for this method are the same
except that a thermostable DNA polymerase is required.
Since there is only single primer in each reaction, the
amplification of the fragment is linear.
The high temperature both prevent secondary structure,
which block elongation, and mismatching.
The cycle sequencing resolves between 600 and 800
nucleotides at a time.
Cycle sequencing
Primer walking
Primer walking
 Both strands of the DNA must be sequenced.
 False priming could give erroneous and ambiguous
results.
 The primers are generally at least 24 nucleotides long.
 High stringent annealing conditions do not permit
spurious binding of the primer to similar but not
identical sequences.
 Primer has been used to sequence pieces of DNA that
have been cloned into bacteriophage λ or cosmid
vector (~20 and ~45 kb, respectively.)
Pyrosequencing
 The first of the second-generation sequencing




technology.
The basis of the technique is the detection of
pyrophosphate that is released during DNA synthesis.
The α-phosphate of each incoming complementary
dNTP is joined to the 3’ OH group of the last
nucleotide of the growing strand.
The β- and γ-phosphates are cleaved as a unit that is
called pyrophosphate.
Pyrophosphate is formed only when the
complementary nucleotide is incorporated the end of
growing strand.
Pyrosequencing
Pyrosequencing
Pyrosequencing
Reversible chain terminators
 For pyrosequencing, DNA is sequenced by synthesis.
 Each of the 4 nucleotides must be added to the




reaction sequentially in separate cycles.
This process would be faster if all the nucleotides were
added together for each cycle.
It is necessary to ensure that the growing DNA strands
are extended by only a single nucleotide during each
cycle.
The incorporated nucleotides are recognized
individually.
These can be met with reversible chain terminators
and four-color fluorescence.
Reversible chain terminators
 The 3’ carbon of the deoxyribose sugar is capped with
a chemical group that blocks subsequent addition of
nucleotides.
 A different fluorophore is attached to each nucleotide
at positions that do not interfere with either base
pairing or phosphodiester bond formation.
 The 3’ blocking group and the fluorescent dye are
quickly removed after incorporation. The emissions
are recorded.
 The decapping step must restore a hydroxyl group at
the 3’ position.
Reversible chain terminators
Sequencing by ligation
 Set of nanomers sequences that one position is fixed
and the other eight sites are filled by any of the four
nucleotides are added in the reaction.
 First cycle, the anchor primer anneals to the adaptor
sequence at the 3’ end of the template sequence.
Nanomers with A,T,C, and G in the first query position
are added. The complementary nanomer that
hybridized to the template will be ligated to the primer.
 The fluorescent signal is record and the ligated primernanomer strand is released by melting.
 The second cycle is repeated using another pool of
nanomers with fixed nucleotides in query position 2 to
identify the nucleotide in the second position.
Sequencing by ligation
Large-scale DNA Sequencing
 There are two categories of DNA sequencing
projects: de novo genome sequencing and
resequencing.
 Sequencing entire genomes that have not been done
is de novo genome sequencing.
 Resequencing entails comparing a newly determined
sequence with a known reference sequence.
 Applications include the identification of pathogenic
strains, drug discovery, tests for disease-related
mutations, forensic annalyses, and development of
biological products.
Genomic DNA Sequencing
 Determining the long contiguous genome is more
complex than sequencing a single piece of DNA.
 First, the DNA fragments needed for Sanger
sequencing have to be limited to a few hundred
nucleotides in length.
 The uniform concentration of each of the
terminated fragments is required for reading the
maximum number of base pairs.
 However, it is difficult since different
concentrations of ddNTPs are needed for chains
of different length.
Genomic DNA Sequencing
 Second, the polyacrylamide gel electrophoresis
used in sequencing cannot discriminate between
fragments longer than 800 bases .
 The number of bases that can be determined
accurately in a single lane of a gel is about 750
bases.
 Determining the sequence of any substantial
segment of DNA involves generating many short
sequencing reads from overlapping sections of
DNA.
 The process is called “assembling”
Genomic DNA Sequencing
 Whole genome sequencing is still difficult.
 There were questions about how to collect,
catalog, and assemble very large numbers of
sequencing reads, especially with the repetitive
sequences through out the genome.
 The computer programs for sequence assemble
were not capable of handling the extremely large
number of sequencing reads.
 Early 1990, the “map-based ” strategy was
employed to sequence S. cerevisiae and c.
elegans.
Genomic DNA Sequencing
 Building maps is both time consuming and
expensive, so “whole-genome shotgun” was
considered.
 It would be possible to sequence a genome by
cloning it into many thousands of small plasmids,
sequencing these at random, and assembling the
reads without knowing the locations of the clones
in the genome.
 This method was used to sequence the genome
of H. influenza.
Shotgun cloning strategy
 The DNA fragments were prepared by breaking
genomic DNA mechanically into suitable size,
cloned into vectors to make subclone libraries.
 It is important to coverage the whole genome,
that is the number of independent subclones that
will be needed to ensure having a complete
sequence.
 To ensure that most of a genomic is represented
in the sequence data, typically a level of 6X –
10X coverage is needed.
Shotgun cloning strategy
 Genomic DNA is isolated and randomly fragmented
by sonication, nebulization, or hydrodynamic
shearing.
 The frayed DNA fragments are repaired and
phosphorelated.
 Fragments are separated into small, medium, and
large fractions and cloned into plasmids and fosmids
vector.
 After transformation of E. coli with the library, colonies
with cloned DNA are picked and grown.
 Vectors DNA is purify from each library and
sequenced.
Shotgun cloning strategy
 Repair of the ends
of frayed DNA and
phosphorelation
 T4 DNA polymerase
or 3’-5’ exonuclease
create blunt end
 T4 polynucleotide
kinase phosphorelate
the 5’ ends of the
blunt end fragments
Shotgun cloning strategy
 Genomic DNA
 Size fractionate
 End repair and phosphorelate
 Ligate, clone, and transform E. coli
 Extract DNA and sequence
 Assemble contigs
 Scaffolding and gap closure
 Finished sequence
Paired-end Sequencing
 Assembling the genome from a large number of
sequences reads requires and extremely large
number of pairwise comparisons to identify which
sequencing reads overlap with which.
 However, sequence reads are occasionally
connected incorrectly.
 Paired-end sequencing can minimize this type of
error.
Paired-end Sequencing
 The subclone plasmid library must carry inserts that
are approximately the same length, 3 kb.
 Both ends of the cloned insertion of each plasmid are
sequenced in the shotgun-sequencing stage.
 When these reads are assembled, the computer
check to see if any pairs of reads from the same
plasmid appear in the assembly at places further or
closer than about 3000 bp
 If so, this is an indication of an assembly error.
 Paired-end sequencing can helps sequence across
repetitive regions.
Sequencing Longer Sequence
 Several types of problems with assembled
sequences are identified.
- Low quality sequence
- Uncertain orientation
- Sequence gaps
- Clone gap
Cyclic array Sequencing
 Shotgun cloning strategy is very time-consuming and
expensive, even though many of these steps have
been automated.
 Cyclic array sequencing has been developed.
 In comparison to the 8 months required to sequence
a human genome, cyclic array sequencing can
provide the sequence of human genome in 2 months.
 The strategy: prepare libraries of DNA fragments for
sequencing, immobilizing the sequencing templates
in a dense array on a surface, and use a sequenceby-synthesis approach.
 Features of adaptors A and B that are used for
template preparation, PCR amplification, and
sequencing using a cyclic array-sequencing strategy
with the 454 sequencing platform .
The adaptor A-genomicadaptor B strands without
biotin tag are released by
melting, concentrated and
retained for sequencing.
 DNA capture bead. Oligomers that are
complementary to the PCR amplification sequence of
adaptor B are attached at their 5’ ends to a bead.
Each DNA capture bead hybridizes with only one
adaptor A-genomic DNA-adaptor B strand.
Emulsion PCR
 The beads and PCR reagents, including PCR primers
that anneal to sequences that are part of adaptors A
and B, are stirred vigorously with oil to create a
water-in-oil emulsion “microreactor”.
 During PCR cycle, strands with the same sequences
as the isolated A-DNA-B molecules are synthesized.
 Following the PCR, the emulsion is broken, the
beads are collected, and all the free DNA molecules
are washed away.
 Pyrosequencing is used to determine the nucleotide
sequence. The flow signals from each well is
captured and stored in computer.
454 sequencing