Tigger/pogo transposons in the Fugu genome

Download Report

Transcript Tigger/pogo transposons in the Fugu genome

IB404 - 2. Cloning and Sequencing - Jan 23
Cloning DNA fragments
Invented by Herbert Boyer (UCSF) and Stanley Cohen (Stanford) in the
early 1970s. Boyer founded Genentech and Stanford eventually ran out
of patent protection for the principles of cloning DNA.
Herbert Boyer
Stanley Cohen
Plasmids
1. Small circular double-stranded DNA molecules.
2. In bacteria, usually encoding antibiotic resistance proteins.
3. Some mediate conjugation in bacteria as a means of spreading.
4. pBR322 was original one used for cloning because it had:
a. Ampicillin and tetracycline resistance.
b. No ability to move by conjugation, hence contained.
c. Origin of replication causing many copies per cell.
5. Many thousands of variants based on this.
Other cloning vectors for genome projects
1. Plasmids - clone pieces ranging from 100bp-10,000bp.
2. Cosmids - utilize the cos sites of lambda phage to package
10-50,000bp pieces for introduction into bacteria - then
replicate as plasmids.
3. BACs - Bacterial Artificial Chromosomes - can obtain clones of
50-300,000bp in bacteria, but low copy number.
4. YACs - Yeast Artificial Chromosomes - linear molecules with yeast
centromere and two telomeres, and insert inbetween.
Can be 100-500,000bp, but subject to rearrangement in
yeast cells, so can be suspect (mostly for nematode and human).
DNA sequencing - 1975
Chemical degradation method by Walter Gilbert at Harvard (left).
Enzymatic extension or polymerase or termination or dideoxynucleotide
method by Fred Sanger in Cambridge, England (right).
Sanger dideoxy sequencing
Old manual radioactive method
1. Label primer or a nucleotide with S35 so
will only visualize the new strands.
2. Denature newly synthesized and template
strand DNA to single-stranded DNA.
3. Run the four reactions in separate lanes on
a thin long polyacrylamide gel.
4. Concentrated polyacrylamide allows
resolution of DNA strands 1base
different in length.
5. Transfer gel to a paper backing, dry, and
expose X-ray film overnight to obtain
an autoradiograph.
6. Read bases from the bottom (shortest
fragments) to top.
7. In the early 1990s I generated ~400,000bp
this way from transposons in insects.
A
C G T
Automated Sanger sequencing
1. Leroy Hood at Caltech invented method, and
Applied Biosystems Inc (ABI) comm.
2. Use fluorescent labels on primers and run
products off end of gel as machine scans.
3. ABI added the twist of four different
fluorescent labels at different wavelengths, one for each dideoxynucleotide,
so could run four reactions in one lane.
4. Finally, ABI and others introduced capillary
machines up to 99 capillaries per
machine, capable of reading up to 1000
bases per run in 2 hours.
5. Output of such a machine is 1000X100X10
or 1 million bases per day. This is the
basic technology used since mid-1990s to
sequence most genomes we will consider.
Massively-parallel pyrosequencing (454/ROCHE).
The first of the next-generation or second-generation sequencing methods. The most
sophisticated commercial application is from 454 Life Sciences, bought out by Roche,
and combines nanotechnology to isolate and amplify single DNA molecules and then
pyrosequencing. There are many advantages, including no need to clone and hence bias
samples, and it works well for entire bacterial genomes, and for fungi, and recently for
some insects, e.g. ants, a moth, a fruit fly, etc. One or two human genomes as well.
The nanotechnology part is responsible for the massively-parallel nature of this
method, in which fragments of DNA are mixed at low concentration with 40
micrometer diameter beads that bind them, roughly 1 molecule per bead (beads without
a DNA molecule yield nothing and those with 2 or 3 yield a mess, which is ignored).
The beads are individually enveloped in a tiny droplet of PCR reagents in an emulsion
and the single DNA fragments amplified a million times. Then beads are individually
trapped in tiny cavities in a plate - the current density being 3 million cavities in a
75X75mm plate, of which only about 30% are occupied with DNA-bearing beads,
allowing about 1 million sequencing reactions per plate at ~400 bp each yields ~400
Mb per run (5 hours). The beads are held in place by layers of tinier beads containing
the sequencing polymerase and reagents. Nucleotides are washed over the plate
sequentially, and as they are incorporated the released pyrophosphate is detected by
using it to phosphorylate AMP to ATP, which is detected using firefly luciferase and the
substrate luciferin. A sensitive CCD camera monitors the chemiluminescent light.
454 pyrosequencing
(First version, so “only”
200,000 reads each time, and
short ones. In third version,
1m reads, ~400 b per read)
In fourth version today,
longer reads to ~700b
The visual output is called a flowgram (because one “flows” the nucleotides sequentially across
the sequencing plate). A single flowgram from a single bead in a single well representing a single
fragment of DNA is shown below. Starting from the left, the first red peak indicates that when T
nucleotides were flowed across the plate a burst of light was detected from this well, representing
addition of a single T to each of the ~1 million identical DNA fragments on that bead in that well.
Then A nucleotides were flowed across the plate, but there is no signal from this well, so the next
nucleotide is NOT an A. Then C was flowed, and a single C was detected, then G but there is no
G, then T again, etc. When there are two nucleotides in a row in the template, they will be added
in quick succession, leading to a double-intensity flash of light, hence a 2-mer, and so on for 3and 4-mers. Sadly, this technology gets a little messy when you have long homopolymer strings.
Massively parallel sequencing-by-synthesis (SOLEXA/ILLUMINA)
A second type of next-generation nanotechnology-based DNA sequencing was introduced by
SOLEXA (bought out by ILLUMINA), while ABI does something similar with their ABI-SOLID
machine, trying to stay in the DNA sequencing game.
Like 454 this technology depends on starting with single DNA fragments, which are fixed to a
glass substrate, and PCR amplified in place (forming clusters or polonies), which are then
sequenced using chemical synthesis methods and fluorescently labelled nucleotides and high
resolution CCD cameras. The details are a bit too much for us, but you can see some of them in
schematics from the company here - http://www.illumina.com/pages.ilmn?ID=203.
Some details are worthwhile.
First, their machine can run 8 samples at a time, each in a different
lane of a flow cell, which is basically an enclosed microscope slide.
Second, ~30 million unique polonies are generated per lane.
Third, in their original machine they generated ~35 base reads per
fragment, their next machine did ~75 base reads, and their 3rd
machine available now generates ~100 base reads.
Fourth, each run of the current HiSeq2000 machine takes a week,
but it generates 30X100X8 million bases or roughly 25 Gbp!