Bruce Blumberg
Download
Report
Transcript Bruce Blumberg
mRNA frequency and cloning
• mRNA frequency classes
– classic references
• Bishop et al., 1974 Nature 250, 199-204
• Davidson and Britten, 1979 Science 204, 1052-1059
– abundant
• 10-15 mRNAs that together represent 10-20% of the total RNA mass
• > 0.2%
– intermediate
• 1,000-2,000 mRNAs together comprising 40-45% of the total
• 0.05-0.2% abundance
– rare
• 15,000-20,000 mRNAs comprising 40-45% of the total
• abundance of each is less than 0.05% of the total
• some of these might only occur at a few copies per cell
• How does one go about identifying genes that might only occur at a few
copies per cell?
BioSci 145B lecture 1
page 1
©copyright
Bruce Blumberg 2004. All rights reserved
Normalization and subtraction
• How to identify genes that might only occur at a few copies per cell?
– alter the representation of the cDNAs in a library or probe
– Normalization - process of reducing the frequency of abundant and
increasing the frequency of rare mRNAs
• Bonaldo et al., 1996 Genome Research 6, 791-806
– Subtraction - removing cDNAs (mRNAs) expressed in two populations
leaving only differentially expressed
• Sagerström et al. (1997) Ann Rev. Biochem 66, 751-783
BioSci 145B lecture 1
page 2
©copyright
Bruce Blumberg 2004. All rights reserved
Adams et al., (1991) Science 252, 1651-1656
• The problem – completion of the human genome sequence was very far off in
the distance
– The big debate (circa 1989)
• Sequence entire genome
– Will take a long time and lots of money
• Or sequence mRNAs (cDNAs)
– Will get coding sequences but how to be sure you have every
one?
– How to get rare cDNAs?
BioSci 145B lecture 1
page 3
©copyright
Bruce Blumberg 2004. All rights reserved
Adams et al., (1991) Science 252, 1651-1656
• In 1991 only a few thousand mRNA sequences identified
– Brain mRNAs < 200
– Not good for solving neurological diseases
• Venter and colleagues from National Institute for Neurological
Disorders and Stroke
• How to get rapid sequence to use for
– Mapping
– Studying diseases
– Gene identification
• The solution?
– High throughput sequencing of random cDNAs (96/day!)
• Modern machines 8 x 384 /day each
– These Expressed Sequence Tags have many uses
– Venter proposes that they be used in place of STS (sequence tagged sites)
• Provide more information with less cost and effort (no extensive
validation required)
BioSci 145B lecture 1
page 4
©copyright
Bruce Blumberg 2004. All rights reserved
Adams et al., (1991) Science 252, 1651-1656
• What do you get from EST sequencing
– Rapid survey of expressed genes in cell, tissue, organ or embryo
– Information for gene identification
– Tags for gene mapping
• Test how to improve frequency of new genes in table 1
BioSci 145B lecture 1
page 5
©copyright
Bruce Blumberg 2004. All rights reserved
Adams et al., (1991) Science 252, 1651-1656
• Tables 2
– Table 2 shows that they identified a
number of already known human
genes
– Unsaid is that these are all relatively
abundant transcripts
• At least in intermediate class
– Suggests what subsequent EST
sequencing shows to be the case
• Random EST sequencing
overrepresents abundant and
intermediate frequency
sequences
• Underrepresents rare frequency
class
BioSci 145B lecture 1
page 6
©copyright
Bruce Blumberg 2004. All rights reserved
Adams et al., (1991) Science 252, 1651-1656
• Table 3 shows relationship of ESTs to
other non-identical genes in the
database
– Putative relatives, depending on
degree of sequence similarity
– Ranges from nearly identical to
about 57% (still fairly closely related)
– Conclude that EST sequencing can
identify relatives of genes known in
other species
BioSci 145B lecture 1
page 7
©copyright
Bruce Blumberg 2004. All rights reserved
Adams et al., (1991) Science 252, 1651-1656
• Table 4
– Compare sequences with ProSite
motif database
• This categorizes patterns seen
in sequences
– NLS
– Zinc fingers
– ATP binding cassette
– Etc
– Found several that appear to be
new members of particular classes
– Conclude that EST sequencing and
analysis allows one to identify
unknown members of known gene
families
BioSci 145B lecture 1
page 8
©copyright
Bruce Blumberg 2004. All rights reserved
Adams et al., (1991) Science 252, 1651-1656
• Table 5
– Evaluated accuracy of sequencing 92-98% depending on read length
• Limitation of separation technology (slab gels)
– Very poor by today’s standards (99+% at 600 bases)
• High error rate means must sequence at greater redundancy to
get correct sequence
BioSci 145B lecture 1
page 9
©copyright
Bruce Blumberg 2004. All rights reserved
Adams et al., (1991) Science 252, 1651-1656
• Figure 1 – identification of human relatives of Drosophila neurogenic genes
– These are responsible for neuronal differentiation in Drosophila
– Proves that genes of known function from model organism can be used to
identify interesting human genes to study
BioSci 145B lecture 1
page 10
©copyright
Bruce Blumberg 2004. All rights reserved
Adams et al., (1991) Science 252, 1651-1656
• Figure 2
– Mapped ESTS to chromosomes
– Used PCR to check which
members of a RH panel
corresponded to EST
• Maps the EST to a
chromosome (provided
that RH has been so
mapped)
– Why is this important?
• This enables mRNAs to be
mapped to genomic loci
and provides a quick
entry point to gene
identification
– Diseases
– Mutations
– translocations
BioSci 145B lecture 1
page 11
©copyright
Bruce Blumberg 2004. All rights reserved
Adams et al., (1991) Science 252, 1651-1656
• Conclusions
– EST sequencing is a rapid and efficient way to generate sequence tags
with numerous uses
• 150-400 bp of sequence is enough to identify sequence, map to
chromosome, determine homology with distant organisms
• Claimed matches with yeast and neurospora sequences
– In fact, these were contaminants in library from yeast RNA used
as carrier for precipitations during library construction
» very sloppy
– 337/600 sequences were putative new genes – good method to quickly
identify genes
– Way too many abundant genes – suggested that libraries must be
normalized or subtracted to minimize redundancy
– Pioneered large scale automated sequence entry
– Suggested that in a few years, they would have mapped all of mRNAs
from human brain
• Overly optimistic
BioSci 145B lecture 1
page 12
©copyright
Bruce Blumberg 2004. All rights reserved
dbEST Summary
Organism
May 1999
Homo sapiens (human)
1,380,737
Mus musculus + domesticus (mouse)
521,672
Rattus sp. (rat)
112,390
Glycine max (soybean)
8,236
Drosophila melanogaster (fruit fly)
83,197
Danio rerio (zebrafish)
24,567
Hordeum vulgare + subsp. vulgare (barley)
80
Bos taurus (cattle)
208
Xenopus laevis
408
Triticum aestivum (wheat)
4
Caenorhabditis elegans (nematode)
72,567
Arabidopsis thaliana (thale cress)
37,745
Ciona intestinalis
102
Zea mays (maize)
13,177
Medicago truncatula (barrel medic)
899
Dictyostelium discoideum
15,199
Lycopersicon esculentum (tomato)
9,088
Chlamydomonas reinhardtii
82
Sus scrofa (pig)
4,136
Oryza sativa (rice)
40,499
Silurana+Xenopus tropicalis
0
Solanum tuberosum (potato)
85
Anopheles gambiae (African malaria mosquito)
86
Sorghum bicolor (sorghum)
107
Gallus gallus (chicken)
388
August 2000
2,232,809
1,604,115
188,625
96,930
90,777
71,186
total public ESTs
5,462,530
BioSci 145B lecture 1
2,464,337
page 13
©copyright
92,987
35,218
39,150
101,252
111,736
70,572
72,828
19,183
87,680
33,267
60,237
23
34,738
12,840
Bruce Blumberg 2004. All rights reserved
August 2002
4,533,427
2,624,752
351,827
268,299
256,583
255,334
240,877
235,495
220,132
196,047
189,632
174,624
174,272
168,610
162,917
154,197
148,346
130,324
110,213
105,019
104,619
94,420
94,032
84,712
62,476
12,190,151
dbEST release 040904 - April 9, 2004
Homo sapiens (human)
5,484,645
Mus musculus + domesticus (mouse)
4,088,831
Rattus sp. (rat)
592,060
Triticum aestivum (wheat)
555,472
Ciona intestinalis
492,511
Danio rerio (zebrafish)
484,827
Gallus gallus (chicken)
481,956
Bos taurus (cattle)
409,104
Zea mays (maize)
395,955
Xenopus laevis (African clawed frog)
368,783
Hordeum vulgare + subsp. vulgare (barley)
356,856
Xenopus tropicalis
349,052
Glycine max (soybean)
346,582
Sus scrofa (pig)
287,741
Oryza sativa (rice)
283,989
Drosophila melanogaster (fruit fly)
274,367
Saccharum officinarum
246,301
Caenorhabditis elegans (nematode)
231,096
Arabidopsis thaliana (thale cress)
204,396
Sorghum bicolor (sorghum)
190,864
Dictyostelium discoideum
155,032
Lycopersicon esculentum (tomato)
150,519
Oryzias latipes (Japanese medaka)
149,697
Solanum tuberosum (potato)
149,227
Oncorhynchus mykiss (rainbow trout)
142,967
Schistosoma mansoni (blood fluke)
139,135
Vitis vinifera
137,660
Anopheles gambiae (African malaria mosquito)
134,784
Bombyx mori (domestic silkworm)
116,541
Pinus taeda (loblolly pine)
110,622
Lotus corniculatus var. japonicus
110,563
Number of public entries:
20,685,791
BioSci 145B lecture 1
page 14
©copyright
Bruce Blumberg 2004. All rights reserved