Escherichia coli his2

Download Report

Transcript Escherichia coli his2

7. Understanding a Genome Sequence
Learning outcomes
When you have read Chapter 7, you should be able to:
1. Describe the strengths and weaknesses of the computational and experimental
methods used to analyze genome sequences
2. Describe the basis of open reading frame (ORF) scanning, and explain why this
approach is not always successful in locating genes in eukaryotic genomes
3. Outline the various experimental methods used to identify parts of a genome
sequence that specify RNA molecules
4. Define the term ‘homology' and explain why homology is important in computer-based
studies of gene function
5. Evaluate the limitations of homology analysis, using the yeast genome project as an
example
6. Describe the methods used to inactivate individual genes in yeast and mammals, and
explain how inactivation can lead to identification of the function of a gene
7. Give outline descriptions of techniques that can be used to obtain more detailed
information on the activity of a protein coded by an unknown gene
8. Describe how the transcriptome and proteome are studied
9. Explain how protein interaction maps are constructed and indicate the key features of
the yeast map
10. Evaluate the potential and achievements of comparative genomics as a means of
understanding a genome sequence
7. Understanding a Genome Sequence
7.1. Locating the Genes in a Genome Sequence
7.2. Determining the Functions of Individual Genes
7.3. Global Studies of Genome Activity
7.4. Comparative Genomics
7.1. Locating the Genes in a Genome Sequence
Figure 7.1. A double-stranded DNA molecule has six reading frames. Both
strands are read in the 5′→3′ direction. Each strand has three reading frames,
depending on which nucleotide is chosen as the starting position.
Figure 7.2. ORF scanning is an effective way of locating genes in a bacterial genome.
The diagram shows 4522 bp of the lactose operon of Escherichia coli with all ORFs
longer than 50 codons marked. The sequence contains two real genes - lacZ and lacY indicated by the red lines. These real genes cannot be mistaken because they are much
longer than the spurious ORFs, shown in blue. See Figure 2.20A for the detailed
structure of the lactose operon.
Figure 7.3. ORF scans are complicated by introns. The nucleotide sequence of a short
gene containing a single intron is shown. The correct amino acid sequence of the
protein translated from the gene is given immediately below the nucleotide sequence:
in this sequence the intron has been left out because it is removed from the transcript
before the mRNA is translated into protein. In the lower line, the sequence has been
translated without realizing that an intron is present. As a result of this error, the
amino acid sequence appears to terminate within the intron. The amino acid
sequences have been written using the one-letter abbreviations (see Table 3.1 ). The
genetic code was described in Section 3.3.2; introns are covered in detail in Section
10.1.3.
Figure 7.4. Northern hybridization. An RNA extract is
electrophoresed under denaturing conditions in an
agarose gel (see Technical Note 4.4). After ethidium
bromide staining, two bands are seen. These are the
two largest rRNA molecules (Section 3.2.1) which are
abundant in most cells. The smaller rRNAs, which are
also abundant, are not seen because they are so short
that they run out of the bottom of the gel and, in most
cells, none of the mRNAs (the transcripts of proteincoding genes) are abundant enough to form a band
visible after ethidium bromide staining. The gel is
blotted onto a nylon membrane and, in this example,
probed with a radioactively labeled DNA fragment. A
single band is visible on the autoradiograph, showing
that the DNA fragment used as the probe contains part
or all of one transcribed sequence.
Figure 7.5. Zoo-blotting. The objective is to determine if a fragment of human DNA
hybridizes to DNAs from related species. Samples of human, chimp, cow and rabbit DNAs
are therefore prepared, restricted, and electrophoresed in an agarose gel. Southern
hybridization is then carried out with a human DNA fragment as the probe. A positive
hybridization signal is seen with each of the animal DNAs, suggesting that the human DNA
fragment contains an expressed gene. Note that the hybridizing restriction fragments from
the cow and rabbit DNAs are smaller than the hybridizing fragments in the human and
chimp samples. This indicates that the restriction map around the transcribed sequence is
different in cows and rabbits, but does not affect the conclusion that a homologous gene is
present in all four species.
Figure 7.6. RACE - rapid amplification of cDNA ends.
The RNA being studied is converted into a partial cDNA
by extension of a DNA primer that anneals at an
internal position not too distant from the 5′ end of the
molecule. The 3′ end of the cDNA is further extended
by treatment with terminal deoxynucleotidyl
transferase (Section 4.1.4) in the presence of dATP,
which results in a series of As being added to the cDNA.
This series of As acts as the annealing site for the
anchor primer. Extension of the anchor primer leads to
a double-stranded DNA molecule which can now be
amplified by a standard PCR. This is 5′-RACE, so-called
because it results in amplification of the 5′ end of the
starting RNA. A similar method - 3′-RACE - can be used
if the 3′ end sequence is desired.
Figure 7.7. S1 nuclease mapping. This method of
transcript mapping makes use of S1 nuclease, an
enzyme that degrades single-stranded DNA or RNA
polynucleotides, including single-stranded regions
in predominantly double-stranded molecules, but
has no effect on double-stranded DNA or on DNARNA hybrids. In the example shown, a restriction
fragment that spans the start of a transcription unit
is ligated into an M13 vector and the resulting
single-stranded DNA hybridized with an RNA
preparation. After S1 treatment, the resulting
heteroduplex has one end marked by the start of
the transcript and the other by the downstream
restriction site (R2). The size of the undigested DNA
fragment is therefore measured by gel
electrophoresis in order to determine the position
of the start of the transcription unit relative to the
downstream restriction site.
Figure 7.8. Exon trapping. The exon-trap vector consists of two exon sequences preceded
by promoter sequences - the signals required for gene expression in a eukaryotic host
(Section 9.2.2). New DNA containing an unmapped exon is ligated into the vector and
the recombinant molecule introduced into the host cell. The resulting RNA transcript is
then examined by RT-PCR to identify the boundaries of the unmapped exon.
7.2. Determining the Functions of Individual Genes
Figure 7.9. Two DNA sequences with 80% sequence identity
Figure 7.10. Lack of homology between two sequences is often more apparent when
comparisons are made at the amino acid level. Two nucleotide sequences are shown, with
nucleotides that are identical in the two sequences given in red and non-identities given
in blue. The two nucleotide sequences are 76% identical, as indicated by the asterisks.
This might be taken as evidence that the sequences are homologous. However, when the
sequences are translated into amino acids the identity decreases to 28%. Identical amino
acids are shown in brown, and non-identities in green. The comparison between the
amino acid sequences suggests that the genes are not homologous, and that the similarity
at the nucleotide level was fortuitous. The amino acid sequences have been written using
the one-letter abbreviations (see Table 3.1 ).
Figure 7.11. The tudor domain. The top drawing shows the structure of the
Drosophila tudor protein, which contains ten copies of the tudor domain. The
domain is also found in a second Drosophila protein, homeless, and in the human
A-kinase anchor protein (AKAP149), which plays a role in RNA metabolism. The
proteins have dissimilar structures other than the presence of the tudor domains.
The activity of each protein involves RNA in one way or another.
Figure 7.12. Categories of gene in the yeast genome
Figure 7.13. Gene inactivation by homologous recombination. The chromosomal
copy of the target gene recombines with a disrupted version of the gene carried
by a cloning vector. As a result, the target gene becomes inactivated. For more
information on recombination see Section 14.3.
Figure 7.14. The use of a yeast deletion cassette.
The deletion cassette consists of an antibioticresistance gene preceded by the promoter
sequences needed for expression in yeast, and
flanked by two restriction sites. The start and end
segments of the target gene are inserted into the
restriction sites and the vector introduced into
yeast cells. Recombination between the gene
segments in the vector and the chromosomal copy
of the target gene results in disruption of the latter.
Cells in which the disruption has occurred are
identified because they now express the antibioticresistance gene and so will grow on an agar
medium containing geneticin. The gene designation
‘kan r ' is an abbreviation for ‘kanamycin resistance',
kanamycin being the family name of the group of
antibiotics that include geneticin.
Figure 7.15. Artificial induction of transposition Recombinant DNA techniques have
been used to place a promoter sequence (Section 3.2.2) that is responsive to
galactose upstream of a Ty1 element in the yeast genome. When galactose is absent,
the Ty1 element is not transcribed and so remains quiescent. When the cells are
transferred to a culture medium containing galactose, the promoter is activated and
the Ty1 element is transcribed, initiating the transposition process (Smith et al., 1995).
For more information on activation of eukaryotic promoters, see Box 9.6 and for
details of the retrotransposition process see Section 14.3.3.
Figure 7.16. RNA interference. The double-stranded RNA molecule is broken
down by the Dicer ribonuclease into ‘short interfering RNAs' (siRNAs) of 21–25
bp in length. One strand of each siRNA base pairs to the target mRNA, which is
then degraded by the RDE-1 nuclease. For more details on RNA interference,
see Section 10.4.2.
Figure 7.17. Fusion with liposomes can be used to deliver double-stranded
RNA into a human cell.
Figure 7.18. Functional analysis by gene overexpression. The objective is to determine
if overexpression of the gene being studied has an effect on the phenotype of a
transgenic mouse. A cDNA of the gene is therefore inserted into a cloning vector
carrying a highly active promoter sequence that directs expression of the cloned gene
in mouse liver cells. The cDNA is used rather than the genomic copy of the gene
because the former does not contain introns and so is shorter and easier to
manipulate in the test tube.
Figure 7.19. Two-step gene replacement. See the text for details
Figure 7.20. A reporter gene. The open reading frame of the reporter gene
replaces the open reading frame of the gene being studied. The result is that
the reporter gene is placed under control of the regulatory sequences that
usually dictate the expression pattern of the test gene. For more information on
these regulatory sequences, see Sections 9.2 and 9.3. Note that the reporter
gene strategy assumes that the important regulatory sequences do indeed lie
upstream of the gene. This is not always the case for eukaryotic genes.
Figure 7.21. Immunocytochemistry. The cell is treated with an antibody that is labeled
with a blue fluorescent marker. Examination of the cell shows that the fluorescent
signal is associated with the inner mitochondrial membrane. A working hypothesis
would therefore be that the target protein is involved in electron transport and
oxidative phosphorylation, as these are the main biochemical functions of the inner
mitochondrial membrane.
Technical Note 7.1. Site-directed mutagenesis
7.3. Global Studies of Genome Activity
Figure 7.22. SAGE. See the text for details. In this
example, the first restriction enzyme to be used is
Alu I, which recognizes the 4-bp target site 5′-AGCT3′ (see Table 4.3 ). The oligonucleotide that is ligated
to the cDNA contains the recognition sequence for
Bsm FI, which cuts 10–14 nucleotides downstream,
and so cleaves off a fragment of the cDNA.
Fragments of different cDNAs are ligated to produce
the concatamer that is sequenced. Using this
method, the concatamer that is formed is made up
partly of sequences derived from the Bsm FI
oligonucleotides. To avoid this, and so obtain a
concatamer made up entirely of cDNA fragments,
the oligonucleotide can be designed so that the end
that ligates to the cDNA contains the recognition
sequence for a third restriction enzyme. Treatment
with this enzyme cleaves the oligonucleotide from
the cDNA fragment.
Figure 7.23. Transcriptome analysis.
(A) Transcriptome analysis with a
DNA chip carrying oligonucleotides
representing all the genes in a
small genome. After adding labeled
cDNA, the positions of the
hybridization signals on the chip
indicate
which
genes
have
contributed to the transcriptome
under study. (B) With a larger
genome, cDNA clones prepared
from the transcriptome of one
tissue are immobilized as a
microarray and probed with cDNAs
representing the same or a
different
transcriptome.
By
comparing
the
hybridization
patterns, genes that are expressed
differently in the tissues from
which the transcriptomes are
obtained can be identified.
Figure 7.24. Studying a proteome by two-dimensional gel electrophoresis
followed by MALDI-TOF. (A) After two-dimensional gel electrophoresis a protein
of interest is excised from the gel and digested with a protease such as trypsin,
which cuts immediately after arginine or lysine amino acids. This cleaves the
protein into a series of peptides which can be analyzed by MALDI-TOF. (B) In the
mass spectrometer the peptides are ionized by a pulse of energy from a laser
and then accelerated down the column to the reflector and onto the detector.
The time of flight of each peptide depends on its mass-to-charge ratio. The data
are visualized as a spectrum (C). The computer contains a database of the
predicted molecular weights of every trypsin fragment of every protein encoded
by the genome of the organism under study. The computer compares the masses
of the detected peptides with the database and identifies the most likely source
protein.
Figure 7.25. Phage display. (A) The cloning
vector used for phage display is a
bacteriophage genome with a unique
restriction site located within a gene for a
coat protein. The technique was originally
carried out with the gene III coat protein of
the filamentous phage called f1, but has now
been extended to other phages including λ.
To create a display phage, the DNA sequence
coding for the test protein is ligated into the
restriction site so that a fused reading frame
is produced - one in which the series of
codons continues unbroken from the coat
protein gene into the test gene. After
transformation of Escherichia coli, this
recombinant molecule directs synthesis of a
hybrid protein made up of the test protein
fused to the coat protein. Phage particles
produced by these transformed bacteria
therefore display the test protein in their
coats. (B) Using a phage display library. The
test protein is immobilized within a well of a
microtiter tray and the phage display library
added. After washing, the phages that are
retained in the well are those displaying a
protein that interacts with the test protein.
Figure 7.26. The yeast two-hybrid system. (A) On the left, a gene for a human protein has been ligated to
the gene for the DNA-binding domain of a yeast activator. After transformation of yeast, this construct
specifies a fusion protein, part human protein and part yeast activator. On the right, various human DNA
fragments have been ligated to the gene for the activation domain of the activator: these constructs
specify a variety of fusion proteins. (B) The two sets of constructs are mixed and cotransformed into
yeast. A colony in which the reporter gene is expressed contains fusion proteins whose human segments
interact, thereby bringing the DNA-binding and activation domains into proximity and stimulating the
RNA polymerase. See Section 9.3.2 for more information on activators.
Figure 7.27. Using homology analysis to deduce protein-protein interactions. The 5′
region of the yeast HIS2 gene is homologous to Escherichia coli his2, and the 3′
region is homologous to E. coli his10.
Figure 7.28. The yeast protein interaction map. Each dot represents a
protein, with connecting lines indicating interactions between pairs of
proteins. Red dots are essential proteins: an inactivating mutation in the
gene for one of these proteins is lethal. Mutations in the genes for proteins
indicated by green dots are non-lethal; mutations in genes for proteins
shown in orange lead to slow growth. The effects of mutation in genes for
proteins shown as yellow dots are not known. From Jeong et al., Nature,
411, 41–42. Copyright 2001 Macmillan Magazines Limited
7.4. Comparative Genomics