PPT - Bruce Blumberg

Download Report

Transcript PPT - Bruce Blumberg

BioSci 145B Lecture #5 5/4/2004
• Bruce Blumberg
– 2113E McGaugh Hall - office hours Wed 12-1 PM (or by appointment)
– phone 824-8573
– [email protected]
• TA – Curtis Daly [email protected]
– 2113 McGaugh Hall, 924-6873, 3116
– Office hours Tuesday 11-12
• lectures will be posted on web pages after lecture
– http://eee.uci.edu/04s/05705/ - link only here
– http://blumberg-serv.bio.uci.edu/bio145b-sp2004
– http://blumberg.bio.uci.edu/bio145b-sp2004
BioSci 145B lecture 5
page 1
©copyright
Bruce Blumberg 2004. All rights reserved
Genome sequencing
• The problem
– Genome sizes for most eukaryotes are large (108-109 bp)
– High quality sequences only about 600-800 bp /pass
• The solution
– Break genome into lots of bits and sequence them all
– Reassemble with computer
• The benefit
– Rapid increase in information about genome size, gene comparisons, etc
BioSci 145B lecture 5
page 2
©copyright
Bruce Blumberg 2004. All rights reserved
Genome sequencing (contd)
• Shotgun sequencing NOT invented
by Craig Venter
– Messing 1981 first description
of shotgun
– Sanger lab developed current
methods in 1983
– approach
• blast genome into small chunks
– Shearing is usual
– 4-cutters also used
• clone these chunks
– In the early days, try
to make small insert
libraries .5-1.5kb
– Now typically make
3 library types
» 3-5 kb, 8 kb plasmid
» 40 kb fosmid - to jump
repetitive sequences
BioSci 145B lecture 5
page 3
©copyright
Bruce Blumberg 2004. All rights reserved
Genome sequencing(contd)
• sequence + assemble by
computer
– A priori difficulties
• how to assemble
fragments
– Software now very
good
• what to do about repeats?
– Fosmids and BAC STC
help a lot
• how to get nice uniform
distribution of sequences
without too much
redundancy?
– Biggest problem, not
really well resolved
BioSci 145B lecture 5
page 4
©copyright
Bruce Blumberg 2004. All rights reserved
Genome sequencing(contd)
– Assembled sequences always have gaps of various sizes
• how to cross these gaps?
– Quickly and cost-effectively
• Need to link sequences somehow
– How depends on the size of the gaps to be crossed
BioSci 145B lecture 5
page 5
©copyright
Bruce Blumberg 2004. All rights reserved
Genome sequencing(contd)
– For small gaps (up to 8 kb
or so)
• often can close by
sequencing both ends
of clones
– For medium sized gaps
(8-30 kb)
• Primer walking across
a linking clone
(cosmid or fosmid)
BioSci 145B lecture 5
page 6
©copyright
Bruce Blumberg 2004. All rights reserved
Genome sequencing (contd)
• Large gaps require much more effort
– Identify large insert clones that span gap
• Typically from BAC end sequences
• May have to screen libraries to find
– Shotgun sequence these and assemble
– Close any small gaps remaining with primer walking
BioSci 145B lecture 5
page 7
©copyright
Bruce Blumberg 2004. All rights reserved
Genome sequencing (contd)
• Shotgun sequencing (contd)
– How to minimize sequence redundancy (re-sequencing the same region)?
• Best way to minimize redundancy is map before you start
– C. elegans was done this way - when the sequence was finished,
it was FINISHED
» mapping took almost 10 years
– mapping much too tedious and nonprofitable for Celera
» who cares about redundancy, let’s sequence and make $$
• why does redundancy matter?
– Finished sequence today costs about $0.50/base
BioSci 145B lecture 5
page 8
©copyright
Bruce Blumberg 2004. All rights reserved
Genome sequencing (contd)
– Mapping by hybridization
– Mapping by fingerprinting
BioSci 145B lecture 5
page 9
©copyright
Bruce Blumberg 2004. All rights reserved
Genome sequencing (contd)
• Actual large insert fingerprinting gel
BioSci 145B lecture 5
page 10
©copyright
Bruce Blumberg 2004. All rights reserved
Traditional (map first) vs STC (map as you go along) mapping
Map before
sequencing
BioSci 145B lecture 5
Map as
you go
page 11
©copyright
Bruce Blumberg 2004. All rights reserved
The human genome
• In Feb 12 2001, Celera and Human Genome project published “draft” human
genome sequencs
– Celera -> 39114 (WGS)
– Ensembl -> 29691 (map as you go)
– Consensus from all sources ~30K
• Number of genes
– C. elegans – 19,000
– Arabidopsis 25,000
• Predictions had been from 50-140k human genes
– What’s up with that?
– Are we only slightly more complicated than a weed?
– How can we possibly get a human with less than 2x the number of genes
as C. elegans
– Implications?
• UNRAVELING THE DNA MYTH: The spurious foundation of genetic
engineering, Barry Commoner, Harpers Magazine Feb, 2002
BioSci 145B lecture 5
page 12
©copyright
Bruce Blumberg 2004. All rights reserved
The human genome
• The answer – Somewhat sloppy science
– Gene sets don’t overlap completely
– Floor is 42K
– 105,680 UniGene clusters from ESTs (down from 128,826 last year)
= 42113
BioSci 145B lecture 5
page 13
©copyright
Bruce Blumberg 2004. All rights reserved
Genome sequencing(contd)
• Whole genome shotgun sequencing (Celera)
– premise is that rapid generation of draft sequence is valuable
– why bother trying to clone and sequence difficult regions?
• Basically just forget regions of repetitive DNA - not cost effective
– R0t analysis suggests not many genes there anyway
– using this approach, genome was alleged to be 90% finished in 2001
• More than 95% today
• rule of thumb is that it takes at least as long to finish the last 5% as it
took to get the first 95%
– problems
• sequence may never be complete as is C. elegans
• much redundant sequence with many sparse regions and lots of gaps.
• Fragment assembly for regions of highly repetitive DNA is dubious at
best
• Map as you go method inherently more complete
– Sets up for finishing since an ordered set of overlapping BACs is produced
• Both methods produce reasonable data given enough sequencing
BioSci 145B lecture 5
page 14
©copyright
Bruce Blumberg 2004. All rights reserved
The human genome
• How finished is the human genome sequence?
– Draft sequence to high coverage
– Chromosome by chromosome finishing now
• Chr 22 – 1999
• Chr 21 – 2000
• Chr 20 – 2001
• Chr 15 – 2003
• Chr 6,7,Y-2003
• Chr 13,19 -2004
BioSci 145B lecture 5
page 15
©copyright
Bruce Blumberg 2004. All rights reserved
Genome sequencing (contd)
• Knowing what we know now – how to approach a large new genome?
– Xenopus tropicalis 1.7 Gb (about ½ human)
– BAC end sequencing
– Whole genome shotgun
– Gaps closed with BACS
– 8 x coverage by end of 2004
– Finishing dependent on additional funding
BioSci 145B lecture 5
page 16
©copyright
Bruce Blumberg 2004. All rights reserved
Genome sequencing
• DOE – Joint Genome Institute
– http://www.jgi.doe.gov/
– Numerous advances in sequencing technology
• Increased pass rate from ~70% to > 90%
• Lowered cost nearly 3 fold
BioSci 145B lecture 5
page 17
©copyright
Bruce Blumberg 2004. All rights reserved
Other sequencing technologies
• Sequencing by hybridization
is most interesting
– Construct a high-density
microchip with all possible
combinations of a short
oligonucleotide
• Up to 25-mers
• By photolithography
– Synthesized on
chip directly
– Label and hybridize
fragment to be sequenced
– Wash stringently
– Read fluorescent spots
– Reconstruct sequence
by computer
BioSci 145B lecture 5
page 18
©copyright
Bruce Blumberg 2004. All rights reserved
Other sequencing technologies (contd)
• Sequencing by hybridization rarely used for de novo sequencing
– Extremely fast and useful to sequence something you already know the
sequence of but want to identify mutation
– Disease causing changes
• e.g in mitochondrial DNA
– SNP discovery
– Works best for examining sequence of <10 kb
BioSci 145B lecture 5
page 19
©copyright
Bruce Blumberg 2004. All rights reserved
Other sequencing technologies (contd)
• http://www.affymetrix.com/products/arrays/index.affx
• SNP discovery
– Photo shows
mitochondrial chip
– Right panel shows pairs
of normal (top) vs
disease (bottom)
(Leber’s Hereditary
Optic Neuropathy)
• Top 3 disease
mutations
• Bottom control
with no change
BioSci 145B lecture 5
page 20
©copyright
Bruce Blumberg 2004. All rights reserved
Useful software for molecular biology (contd)
•
NCBI – www.ncbi.nlm.nih.gov
– main information and analysis resource
– indispensable resource
BioSci 145B lecture 5
page 21
©copyright
Bruce Blumberg 2004. All rights reserved
Useful software for molecular biology (contd)
•
•
NCBI – Blast – how to find similar genes
www.ncbi.nlm.nih.gov/BLAST/
BioSci 145B lecture 5
page 22
©copyright
Bruce Blumberg 2004. All rights reserved
Useful software for molecular biology (contd)
•
Why pay Celera?
BioSci 145B lecture 5
page 23
©copyright
Bruce Blumberg 2004. All rights reserved
Practice midterm
1. (6 points) Your laboratory works on the strange organisms that live around
hydrothermal vents in the deep ocean as a model system for the first
multicellular organisms. Your PI has developed a new method of culturing
such organisms, making it possible to grow the wormlike animals found
around the vents in the laboratory. One of the first things that needs to be
done is to construct the molecular tools that will be required to characterize
your assigned animal, the Pompeii worm (Alvinella pompejana) which can
survive an environment as hot as 80° C. The ultimate goal will be to establish
an A. pompejana genome project including whole genome sequencing and
mapping, an EST project and DNA microarrays.
The first goal is to make a genomic library. What type of library will you
make, i.e., which type of vector? Justify your choice. What type of
equipment will be required to make your library?
BioSci 145B lecture 5
page 24
©copyright
Bruce Blumberg 2004. All rights reserved
Practice midterm
1. answer
You should choose to make a BAC or PAC library. BAC is best for genome
sequencing because it accepts large inserts, is stable and the vector is small,
facilitating shotgun sequencing
Not so much equipment required other than standard molecular biology
laboratory equipment, electroporator and PFGE – pulsed field gel
electrophoresis. PFGE is indispensable for isolation of large DNA as needs to
be used for making good genomic libraries.
BioSci 145B lecture 5
page 25
©copyright
Bruce Blumberg 2004. All rights reserved
Practice midterm
2. (4 points) Describe a method to make a physical map of the A. pompejana
genome in order to facilitate large-scale sequencing.
Use large insert genomic library to construct a map.
Map the clones by fingerprinting, map as you go, or hybridization.
Restriction mapping of the whole genome was NOT an acceptable
answer.
BioSci 145B lecture 5
page 26
©copyright
Bruce Blumberg 2004. All rights reserved
Practice midterm
3. (5 points) You received an E. coli strain with the following genotype from a
neighboring laboratory for the purposes of propagating your genomic library:
mcrA, Δ(mrr-hsdRMS-mcrBC), ΔlacX74, deoR, recA1, araD139, Δ(araleu)7697, galU, galK, endA1, nupG (in every case above, the bacteria are
DEFICIENT in the indicated gene product)
a) Is this a good strain for the type of genomic library you have chosen to
make, i.e., does it have the necessary genetic markers for your library to
be stable and readily screened?
b) If so, what are the desirable markers that the strain has. If not, which
ones are missing?
c) Would the strain be suitable if you had made a YAC library? Why?
a) suitable for PAC and BAC
b) is restriction deficient, and deoR. Some also pointed out that the
strain should have lacZΔM15 for blue white selection if BACs were being
used.
c) strain is not suitable for YAC library because yeast artificial
chromosomes can only be propagated in YEAST
BioSci 145B lecture 5
page 27
©copyright
Bruce Blumberg 2004. All rights reserved
Practice midterm
4. (5 points) A colleague has experimentally determined that the A. pompejana
genome is 110 Mb – right between C. elegans (97 Mb) and Drosophila
melanogaster (120 Mb). Describe a sequencing strategy that could allow the
rapid generation of a draft genome sequence. How might you combine the
mapping proposed in your answer to question 2 to facilitate the completion
of the genome sequence?
Whole genome shotgun will generate a rapid draft sequence. Combining this
with whole genome map made in 2 will enable closing gaps.
BioSci 145B lecture 5
page 28
©copyright
Bruce Blumberg 2004. All rights reserved
Practice midterm
5. (6 points) As a side project, you decide to see if the A. pompejana genome
contains homeobox genes. You dig into the laboratory archives and find a
cDNA probe that contains the Drosophila melanogaster Antennapedia
homeobox. What is the best way to find whether the A. pompejana genome
contains homeobox genes? If so, how will you isolate genomic clones
containing these homeobox genes? Let’s say you find 8 A. pompejana
homeobox genes. Describe a quick way to tell whether they are located in
one or more clusters as in Drosophila or C. elegans?
Genomic southern with A. pompejana DNA probed with Antp homeobox
to work out conditions
Screen the genomic library you made using the Antp probe using these
conditions
Once you recover the 8 genes, start hybridizing them back to the large
insert clones or to Southern of PFGE electrophoresis of 8-cutter digest of
genomic DNA. Note whether more than 1 homeobox gene maps to each
clone or fragment
BioSci 145B lecture 5
page 29
©copyright
Bruce Blumberg 2004. All rights reserved
Practice midterm
7. (6 points) Remember that you also need to provide material for the EST
project. This means that it is time to make cDNA libraries, right? Assume that
the libraries you make will be used for more than just EST sequencing. What
sort of vector will you choose? Should you go to the trouble of enriching the
library for full-length cDNAs? If so, how? Should the libraries be standard,
normalized, or subtracted? Justify your answer. If normalized or subtracted
libraries are required, describe generally how you will make them.
• Plasmid vector (NOT PAC or BAC)
• Yes you should enrich for full-length cDNAs since the library will be used
for multiple purposes
• Cap trap, oligo-capping or cap-affinity chromatography gets full-length
mRNA which should yield a library enriched for full-length cDNAs
• The libraries should be normalized since EST sequencing is contemplated
and we don’t want to sequence the same thing many times
• Make normalized libraries by making driver from the library you wish to
normalize, then hybridizing it back to ss-cDNA from that library to a low
Cot value (5-10). After removing hybrids, use the remaining cDNA to
make the normalize library
BioSci 145B lecture 5
page 30
©copyright
Bruce Blumberg 2004. All rights reserved
Practice midterm
8. (4 points) What are the major differences between normalized and
subtracted cDNA libraries? If you want to use a cDNA library to isolate genes
expressed specifically in the tail of A. pompejana compared with the head,
would it be better to normalize or subtract the probe that you will use?
Explain your reasoning.
Normalized libraries are depleted in abundant genes and enhanced in
rare genes by self-hybridization.
Subtracted libraries are depleted in genes that are common between two
sources
A subtracted probe is appropriate here since you wish to identify genes
specifically expressed in the tail.
BioSci 145B lecture 5
page 31
©copyright
Bruce Blumberg 2004. All rights reserved