Transcript Lecture 6
FCH 532 Lecture 6
Chapter 5
Page 111
Figure 5-51 A degenerate oligonucleotide probe.
Page 113
Figure 5-52 Colony (in situ) hybridization.
Page 113
Figure 5-53 Chromosome walking.
Page 117
Figure 5-56
Constru
ction of a
recombinant
DNA molecule
by directional
cloning.
Page 114
Figure 5-54 The
polymerase chain
reaction (PCR).
•Thought up by Kerry Mullis
in 1985.
•Amplify DNA up to 10 kb.
•Heat denatured DNA is
incubated with
•DNA polymerase
•dNTPs
•Two oligonucleotide
primers
•Heat stable poymerases
used
•Taq
•Pfu
PCR
• Amplified DNA can be used for RFLP analysis, Southern
blotting, and sequencing.
• Can be used for rapid detection of diseases nad
mutations.
• Can be used to identify DNA from hair, sperm, blood by
amplification of short tandem repeats (STRs)-segments
of repeating DNA sequence (2 -7 bp) such as (CA)n and
(ATGC)n
• STRs are genetically variable and can be used as
markers for individuality. The number of tandem repeats
of STR are unique to an individual.
• STRs are amplified from unique sequence outside the
tandem repeats.
• RNA can be amplified by PCR; first reverse transcribing
it to DNA (cDNA) through reverse transcriptase.
Figure 5-57 Site-directed mutagenesis.
Allows for the “customization” of
a protein.
Page 118
Oligonucleotide containing a
short gene segment with the
desired altered base sequence
corresponding to the new amino
acid sequence is used as a
primer in the reaction.
In this case used DNA
polymerase I. Can also use
PCR to amplify a gene of
interest and insert a mutation in
the primer.
Production of proteins
• Cloned structural genes can be inserted into an
expression vector to produce recombinant protein.
• Relaxed control plasmid with an efficient promoter
can produce up to 30% of the total cellular protein
as the inserted structural gene.
• Inclusion bodies-large amounts of insoluble and
denatured protein. The protein must be extracted
and renatured by dissolving in a chaotrope like urea
or guanididium chloride and slowly renaturing the
protein.
Page 116
Figure 5-55 Electron micrograph of an inclusion body
of the protein prochymosin in an E. coli cell.
Production of proteins
• Can engineer a signal sequence to target the
protein to the periplasmic space of the bacteria so it
folds properly.
• Toxic proteins can be placed under an inducible
promoter (lac) promoter in a plasmid that also has
the gene for the lac repressor protein.
– Binding of the lac repressor will prevent the expression
from the lac promoter.
– After cells have grown to high density, an inducer
(isopropylthiogalactoside-IPTG, a synthetic
nonmetabolizable analog of allolactose) is added to
release the lac repressor protein.
Reporter genes can be used to
monitor transcription
• Rate at which a gene is expressed dependent on
upstream control sequences.
• Replace the gene you want to monitor with a reporter
gene.
• Reporter genes encode proteins that can be easily
detected by some assay. lacZ can be assayed with xgal and the production of blue color.
• Another reporter is the green fluorescent protein (GFP)
which produces a bioluminiscent protein when irradiated
with UV or 400nm light.
geneX
Replace geneX with reporter gene in the correct reading
frame
lacZ
In the presence of X-gal, expression will
produce the blue color.
Page 119
Figure 5-58 Use of green fluorescent protein (GFP) as
a reporter gene.
Transgenic organisms
• Organisms expressing a foreign gene are considered
transgenic.
• Foreign gene referred to as transgene.
• For the change to be permanent, transgene must be
stably integrated into germ cell.
• Established in mice by microinjection of DNA into a
pronucleus of a fertilized ovum.
• Can also be accomplished in an embryonic stem cell.
Page 119
Figure 5-59 Microinjection of DNA into the pronucleus
of a fertilized mouse ovum.
Nucleic acid sequencing
• Development of DNA sequencing techniques has
spurred the huge amount of DNA sequence data (>35
billion nucleotides in 2003 and growing!)
• Complete genomes determined for over 110 prokaryotes
and over 11 eukaryotes.
Nucleic acid sequencing
• Development of DNA sequencing techniques has
spurred the huge amount of DNA sequence data (>35
billion nucleotides in 2003 and growing!)
• Complete genomes determined for over 110 prokaryotes
and over 11 eukaryotes.
Page 177
Table 7-3a
Some Sequenced Genomes
Page 177
Table 7-3bSome Sequenced
Genomes.
Nucleic acid sequencing
• Chain terminator method (aka dideoxy sequencing)used to sequence long stretches of DNA.
• Utilizes DNA polymerase to synthesize single stranded
DNA.
• Assembles the four deoxynucleoside triphosphates
(dNTPs) into a complementary sequence.
• Initiates from a primer sequence.
• Sequence is terminated after the incorporation of 2’3’dideoxynucleoside triphosphate (ddNTP)
P
P
P
Base
OCH2 O
H H
H
HH
H
Page 178
Figure 7-14 Flow diagram of the chain-terminator
(dideoxy) method of DNA sequencing.
Figure 7-15
O
O
35S
O P O O P O O P
O-
O-
Autoradiograph of a sequencing gel.
O-
H H
H
Page 179
Base
O OCH2 O
HH
H
A G C T
A GC T
Page 179
Chain-terminator method has been automated.
Instead of radioactivity, use fluoresence-labeling techniques.
2 types used:
1. Four reaction/one gel systems - primers used in each of the four
chain extension reactions are 5’-linked to a differently fluorescing dye.
Loaded into a single lane of a gel. As each exits, the fluorescence is
detected.
2. One reaction/one gel system - Each of the four ddNTPs used to
terminate the chain extension is linked to a different fluroescing dye.
The extension is carried out in a single vessel and the mixture is
loaded into a single lane.
Advanced systems use capillaries instead of slab gels.
Genome sequencing
•
•
•
•
•
•
•
•
•
In order to sequence entire genomes, segments need to be
assembled into contigs (contiguous blocks) to establish the correct
order of the sequence.
Chromosome walking may be one way to do so, but is prohibitively
expensive.
Two methods have been used recently:
1. Conventional genome sequencing-low resolution maps made by
identifying “landmarks” in ~250 kb inserts in YACs.
Landmarks are 200-300 bp segments, aka sequence tagged
sites(STSs)-2 clones with the same STS overlap.
STS-containing inserts are sheared randomly into ~40kB segments
and cloned into cosmid vectors-used to create high resolution maps.
The cosmid inserts are fragmented to smaller sizes and sequenced.
Cosmid inserts are assembled by using the STS sequence overlaps
and cosmid walking.
Cannot be used effectively with sequences containing high amounts
of repetitive sequence. (Use expressed sequence tags (ESTs)).
Genome sequencing
•
2. Shotgun strategy– genome library is randomly fragmented
– large amount of cloned fragments are sequenced.
– Genome is assembled by identifying overlaps between pairs of fragments.
•
•
The probability that a base is not sequenced is e-c,
c is the redundancy of coverage, c = LN/G,
– where L is the average length of the cloned inserts in base pairs,
– N is the number of inserts sequenced,
– and G is the length of the genome in base pairs.
•
•
•
•
The aggregate length of the gaps between contigs is G e-c and the
average gap size is G/N.
Bacterial genomes-shotgun strategy is straightforward. Gaps are filled in
by synthesizing PCR primers and finishing a genome.
Eukaryotic genomes-larger size so it must be carried out in stages using
BACs and then identifying ~500 bp sequences from each to yield
sequence tagged connnectors (STCs or BAC ends)
This allows assembly via the overlapping of STCs.
Page 180
Figure 7-17
Genome sequencing strategies.
Human genome
• 2.2 billion nucleotide sequence ~90% complete
because of highly repetitive sequence.
• About half of the human genome consists of various
repeating sequences.
• Only ~28% of the genome is transcribed to RNA
• Only 1.1% to 1.4% of the genome (~5% of the
transcribed RNA) encodes protein.
• Only ~30,000 protein encoding genes (open reading
frames or ORFs) identified. Predicted 50,000 - 140,000
ORFs.
• Only a small fraction of human protein families are
unique to vertebrates; most occur in other life forms.
• Two randomly selected human genomes differ, on
average, by only 1 nucleotide per 1250; that is, any 2
people are likely to be >99.9% identical.
Human genome
• 2.2 billion nucleotide sequence ~90% complete
because of highly repetitive sequence.
• About half of the human genome consists of various
repeating sequences.
• Only ~28% of the genome is transcribed to RNA
• Only 1.1% to 1.4% of the genome (~5% of the
transcribed RNA) encodes protein.
• Only ~30,000 protein encoding genes (open reading
frames or ORFs) identified. Predicted 50,000 - 140,000
ORFs.
• Only a small fraction of human protein families are
unique to vertebrates; most occur in other life forms.
• Two randomly selected human genomes differ, on
average, by only 1 nucleotide per 1250; that is, any 2
people are likely to be >99.9% identical.
Chemical evolution
• Evolutionary aspects of amino acid sequences.
• Change stem from random mutational events that alter
a protein’s primary structure.
• Mutational change must offer a selective advantage or
at least, not decrease fitness.
• Most mutations are deleterious and often lethal so they
are not reproduced.
• Sometimes mutations occur that increase fitness of the
host in its natural environment.
• Example: Sickle-cell anemia.
Page 183
Figure 7-18a Scanning electron microscope of human
erythrocytes. (a) Normal human erythrocytes revealing their
biconcave disklike shape.
Page 183
Figure 7-18b Scanning electron microscope of human
erythrocytes. (b) Sickled erythrocytes from an individual with
sickle-cell anemia.
Page 184
Figure 7-20 A map indicating the regions of the world
where malaria caused by P. falciparum was prevalent before
1930.
Chemical evolution
• Pauling and co-workers showed that normal human
hemoglobin (HbA) is more electronegative than sicklecell hemoglobin (HbS).
• Sickle-cell anemia is inherited according to the laws of
Mendelian genetics.
• Homozygous for HbS is almost all HbS,
phenotype=sickle cell anemia.
• Heterozygous for HbS is ~40% HBs, phenotype=sickle
cell trait.
• Homozygous for HbA, normal human hemoglobin.
Mutations in a- or b-globin genes
can cause disease state
•
Sickle cell anemia – E6 to V6
•
Causes V6 to bind to hydrophobic pocket
in deoxy-Hb
•
Polymerizes to form long filaments
•
Cause sickling of cells
•
Sickle cell trait offers advantage against
malaria
•
Cells sickle under low oxygen conditions
and if infected with Plasmodium
falciparum.
•
Causes the preferential removal of infected
erythrocytes from circulation.
Variations in homologous proteins
•
•
•
•
•
•
•
•
•
Similar proteins from related species likely derived from the same ancestor.
A protein that is well adapted to its function will continue to evolve.
Neutral drift-mutational changes in a protein that don’t affect its function over time.
Homologous proteins-evolutionarily related proteins.
Comparison of the primary structures of homologous structures can be used
to identify which residues are essential to its function, lesser significance,
and little function.
Invariant residue-the same side chain at a particular position in the amino acid
sequence of related proteins.
If an invariant residue is observed between related proteins, it is likely necessary to
some essential function of the protein.
Other amino acids may have less stringent side chain requirements-where amino
acids may be conservatively substituted-(be substituted with an amino acid with
similar properties).
If many amino acids tolerated at a specific position - hypervariable.
Cytochrome c
• Cytochrome c is nearly universal eukaryotic protein
necessary for electron transport.
• Vertebrates 103-104 residues; up to 8 more aas in
other phyla.
• Similarities are observed in an alignment.
• 38 of 105 residues are invariant and the others are
conservatively substituted.
• 8 positions are hypervariable.
• His 18 and Met 80 form bonds with the redox Fe of the
heme group.
Page 184
Table 7-4a
Amino Acid Sequences of Cytochromes c
from 38 species.
Page 185
Cytochrome c
• Evolutionary differences between two homologous
proteins are determined by counting the amino acid
differences between them.
• Order of differences parallels taxonomy and can be put
into a table.
• This data can be used to construct a phylogenetic
tree-a tree that indicates ancestral relationships among
organisms and their proteins.
Page 186
Figure 7-21
•
Page 187
•
Each branch point
indicates a possible
common ancestor to
everything above it.
Relative evolutionary
distances between
neighboring branch
points are expressed
as the number of
amino acid differences
per 100 residues of the
protein (percentage of
accepted point
mutations or PAM
units).
Phylogenic tree of cytochrome c.