Gene Expression

Download Report

Transcript Gene Expression

Gene Expression
Gene Expression
• Gene expression: 2 basic steps: transcription and translation.
– Transcription: making an RNA copy of a region of chromosome (a
gene)
– Translation: using the information encoded in messenger RNA (mRNA)
to produce a polypeptide.
• Between transcription and translation: the primary RNA transcript of
the gene must be converted to mRNA, and then transported out of
the nucleus to a ribosome in the cytoplasm.
• RNA-only genes:
– for example, ribosomal RNA or snRNA (used for intron splicing). More
types described frequently.
– Need to be processed be become functional.
• After translation: polypeptides need to converted to proteins
– activation/inactivation by phosphorylation, etc.
– protein degradation
Levels of Regulation
• Gene expression is regulated by several different levels
• transcriptional control: Control over whether RNA polymerase will
transcribe the gene or not.
– Caused by the binding of proteins to control regions adjacent to the
gene that allow (or prevent) RNA polymerase to transcribe the gene.
– The most important level of control (probably).
– Each gene is controlled separately.
• post-transcriptional regulation: control of RNA after it has been
transcribed, control of translation, and control of the protein itself.
• RNA: splicing out of introns, transport of mRNA to specific parts of the cell,
RNA stability and destruction
• translation: can ribosomes translate the mRNA molecule into protein or
not?
• protein: processing of polypeptides into functional proteins, protein stability.
• Regulation of chromatin conformation: changes in chromatin
structure that allow or prevent access to groups of genes
– inherited between cells within an individual's lifetime, but not inherited
between generations.
• epigenetic mechanisms: control that is inherited between
generations (from parent to child) but which doesn’t involve altering
the DNA base sequence.
Transcriptional Control
•
Proteins that bind to DNA regulatory
sequences and affect transcription are
called transcription factors.
– Act in trans: they can affect any gene on
any chromosome in the same nucleus that
has a matching binding site.
– Proteins are translated in the cytoplasm
and migrate back into the nucleus to
function.
•
DNA regulatory sequences are adjacent
to the gene are said to
– Act in cis: they only affect the gene they
are attached to (and not other copies of
the gene in the cell).
•
Classifying transcription factors:
– general transcription factors: involved in all
transcription complexes,
– tissue-specific transcription factors: only
used in certain tissues or with certain
external stimuli.
Cis vs. trans
Transcription in General
•
•
RNA polymerase is an enzyme that transcribes DNA into RNA: it polymerizes
RNA out of nucleotides (NTPs). Most RNA polymerases are composed of
several different polypeptide subunits.
There are 3 RNA polymerases in the nucleus (plus another for the
mitochondria).
– RNA polymerase 1 (pol1): ribosomal RNA (tandem arrays of 18S, 28S, 5.8S)
– RNA polymerase 2 (pol2): protein-coding genes, snoRNA (short nucleolar RNA),
miRNA (micro RNA)
– RNA polymerase 3 (pol3): 5S ribosomal RNA, transfer RNA, a few others
•
We will mostly discuss pol2 transcription.
•
Basic concept: 3 steps.
– Initiation: RNA polymerase binds to a promoter sequence on the DNA, opens up
the DNA double helix and starts making RNA
– Elongation: RNA polymerase moves down the DNA strand creating a RNA copy
of the gene, one nucleotide at a time.
– Termination: RNA polymerase stops transcription and falls off the DNA
Cis-Acting DNA Sequences
•
•
•
•
The most important DNA regulatory
sequence is the promoter, the place
where RNA polymerase binds and
starts transcription.
There is no one single defined
promoter sequence. Each gene has
a different promoter sequence, with
various conserved elements.
Five short sequences are conserved
in eukaryotic promoters, but not all
are found with all genes. All are
close to the transcription start point,
with some upstream and some
downstream of it.
The best known is the TATA box,
located about 25 bp upstream from
the transcription initiation point.
Like all these elements, the TATA
box is a consensus sequence, and it
is not present in all genes. (One
count shows only32% of human
genes have a TATA box).
Initiation and Elongation
•
Initiation: The first step in transcription initiation
is the binding of the general transcription factor
TFIID to the promoter region.
–
–
–
•
After that, several other transcription factors bind,
as does RNA polymerase 2, forming the initiation
complex.
At this point, the polymerase transcribes a very
short RNA, but doesn’t move away from the
promoter.
The transcription initiation complex is stalled
Elongation: The transcription complex switches
to the elongation phase when the helicase
subunit (TFIIH) unwinds the DNA and then
activates the RNA polymerase by
phosphorylating it
–
The activated polymerase then moves down the
template strand, making an RNA copy.
– The original transcription factors stay at the
promoter, and new ones bind to polymerase
during elongation.
– Rate: about 20 nucleotides per second
Termination
•
•
Pol2 genes end at a polyadenylation
signal, a short sequence that causes
an enzyme to cut the RNA and add ~
100 adenines (poly A tail) to the 3'
end.
– Consensus sequence similar to
AAUAAA
•
•
•
However, RNA polymerase keeps on
transcribing the DNA.
An exonuclease starts chewing up the
excess RNA. It's faster than RNA
polymerase: when the exonuclease
catches up to the RNA polymerase,
transcription stops.
Possible function of this excess RNA?
Tissue-specific Transcription Factors
•
•
Tissue-specific transcription factors activate
transcription in specific cell types, or in
response to specific signals.
They bind to short DNA sequences that are
near the promoter.
– Used to be thought promoters were upstream
from the promoter, but it is now known they
can be either upstream or downstream from
the promoter (but near it).
– they consist of short consensus sequences: 48 bp long. Lots of potential sites, but most
aren’t used.
•
•
Lots of protein interactions between
transcription factors and initiation complex
A few picky details:
–
–
Some transcription factors bind at places distant from
the promoter (to enhancers and silencers)
Co-activators and co-repressors bind to other proteins
and not to the DNA (a somewhat artificial distinction).
Position of 3 transcription
factor binding sites relative
to transcriptions start.
Transcription Factors
•
Transcription factors generally have two functional
sections (domains):
– DNA-binding domain : attaches to the specific
DNA sequence,
– Activation domain : works by binding to other
proteins to create the transcription complex.
•
The DNA-binding domains fall into several general
types, and proteins that have one of these domains
are usually assumed to be transcription factors.
– Leucine zipper motif. An alpha helix that has a
leucine every 7 amino acids, so all the leucines
are on the same side of the molecule. This
allows the protein to form a dimer by
hydrophobic interactions. This dimer grips the
DNA double helix
– Zinc finger motif: binds a Zn2+ ion between two
cysteines and two histidines (C2H2 proteins) or
between four cysteines (C4 proteins).
Sometimes a zinc finger protein will have more
than one zinc finger motif.
– Helix-turn-helix motif consists of two alphahelices connected by a short region of other
amino acids. The two helices bind the DNA
major groove. This is a common motif in
homeobox gene regulation.
– Helix-loop-helix motif, which is different from
the HTH motif. HLH has a much longer
connecting loop that allows more flexibility in
the molecule.
Leucine zipper
Zinc finger
Helix-loop-helix
Yeast Two-Hybrid System
•
•
•
•
•
•
The yeast two-hybrid system is a way to
detect interactions between proteins. It is
often used to find proteins that interact with
the protein you are studying. Based on the
two domains of a transcription factor.
A transcription factor that regulates the GAL4
gene (involved in galactose utilization) was
split into separate DNA binding and activation
domains.
The “bait” protein (the protein you are
studying) is fused to the binding domain.
A large number of other protein-coding genes
are fused to activation domains: a library of
“prey” sequences.
Each individual prey sequence is cotransformed into yeast along with the bait.
If the bait and prey proteins interact in the cell,
the attached DNA binding and activation
domains will be brought together at the GAL4
gene, causing it to be transcribed. This event
can be detected using a chromogenic (colorgenerating) substrate.
Enhancers and Silencers
• Enhancers and silencers are tissuespecific cis-acting DNA sequences
that increase or decrease
transcription regardless of their
position (within limits, but can be
several Mbases away) or orientation:
they can be either 5’ or 3’ to the gene
itself.
– “locus control regions” are groups of
enhancers; roughly, this is a different
name for the same type of element.
• Transcription factors bind to these
elements.
• Enhancers and silencers work
because the DNA can bend and
allow the transcription factors to
interact with the promoter.
• Often discovered by chromosome
breaks that separate the enhancer
from its target gene (see next slide).
Acheiropodia
•
•
Chromosome breakpoints used to locate enhancers
(A).Translocation break points (vertical dashed arrows) downstream from
PAX6 inactivate the gene in aniridia (absence of iris in the eye). Possible
enhancers are the red boxes, some in the introns of another gene (ELP4).
(B). Various chromosome changes affecting the sonic hedgehog (SHH)
gene, which controls limb development.
Post-transcriptional Regulation
•
•
•
•
•
At least half of all human genes are
expressed in different ways in different
tissues. Different transcriptional start
sites, different intron splicing patterns, and
different poly A addition sites can give
quite a few different proteins from the
same gene.
Different proteins from the same gene are
called isoforms.
Isoforms are produced in different tissues,
different times in development, different
subcellular locations (soluble vs.
membrane-bound, for instance), etc.
Dystrophin, the Duchenne muscular
dystrophy protein, has at least 7 different
transcription start sites, used in different
tissues. (B, brain; M, muscle; P, Purkinje;
R, retina; B,K, brain and kidney; S,
Schwann cells; G, general)
A good example of alternate splicing
patterns in different tissues is
tropomyosin, which has 5 optional exons.
Tropomyosin is a protein in striated
muscle that binds to actin and prevents it
from interacting with myosin: thus it
regulates muscle movements.
Control of Alternative Splicing
• RNA splicing is performed
by snRNPs, small nuclear
ribonucleoprotein
complexes, which are
RNA/protein hybrids.
• Variations in snRNPs (as
well as other proteins)
occur in different cells and
recognize slightly different
splicing signals.
• Some of the splicing
proteins also assist in
transporting mRNA out of
the nucleus.
Messenger RNA Stability and Translatability
•
•
•
•
•
•
micro RNAs (miRNA) are a major
cause of messenger RNA decay in
the cell. They can also prevent
mRNA from being translated by the
ribosome.
miRNAs are produced from RNAonly genes. The RNA forms a stemloop structure.
the Dicer enzyme processes the
double-stranded region,
incorporating one strand of the RNA
into the RISC complex.
The miRNA in the RISC complex is
complementary to (antisense) the 3’
region of a specific messenger RNA.
The RISC complex binds to the
messenger RNA and degrades it.
Usually if the miRNA is a perfect
match to the mRNA.
Alternatively, the RISC complex can
inhibit translation of the messenger
RNA, especially if the match
between miRNA and mRNA isn’t
perfect.
An important finding: large scale
studies have shown that the
presence or absence of any given
miRNA changes the amount of
protein by 2-fold or less in most
cases.
Translational Control
•
•
•
•
•
Regulation of whether the messenger
RNA is translated or not.
The best studied example is ferritin, a
protein that stores up to 4500 iron
atoms (as iron hydroxyphosphate) in
its center.
The ferritin mRNA contains an ironresponse element in the 5’ UTR. The
IRE folds up into a hairpin loop, which
can bind to the IRE-binding protein.
When iron levels are low, IRE-BP
binds and prevents translation of the
mRNA. This allows the ferritin mRNA
to remain intact while preventing any
further sequestration of iron atoms.
Transferrin is the major iron-carrying
protein in the blood serum.
The transferin mRNA contains 3 IREs
in the 3’ UTR. RNA degradation is
prevented by IRE-BP binding.
Control of Protein Degradation
•
•
•
•
To react quickly to the environment, a cell
must be able to remove outdated signals
quickly. Many proteins, especially regulatory
signaling proteins, are degraded by ubiquitinmediated proteolysis.
Ubiquitin is a small protein that is highly
conserved in evolution.
In this system, multiple copies of ubiquitin are
covalently attached to the target protein in
long chains. The complex is then transported
to the proteosome, a large multi-subunit
barrel-shaped structure. The proteosome
degrades the target protein to amino acids
and recycles the ubiquitin.
Target specificity is provided by the enzyme
that attaches ubiquitin to the target proteins:
there are hundreds of different E3-ubiquitin
ligases.
–
–
One target is hydrophobic amino acids that are
normally buried in the protein’s interior or within
membranes.
N-end rule: On average, a protein's half-life
correlates with its N-terminal residue.
•
•
Proteins with N-terminal Met, Ser, Ala, Thr, Val,
or Gly have half lives greater than 20 hours.
Proteins with N-terminal Phe, Leu, Asp, Lys, or
Arg have half lives of 3 min or less.
The proteosome also re-folds misfolded
proteins if the proteins are protected from
degradation by chaperone proteins.
Misfolding is a common result of heat shock.
Ubiquitin plays a number of other roles in the
cell, including cell signaling and X
chromosome inactivation.
Chromatin Conformation
•
•
•
•
•
Recall that chromosomal DNA is wrapped up in
nucleosomes: 8 histone proteins with about 150 bp of
DNA wrapped around them. Higher level packaging
also exists. All of this structure makes it difficult for
RNA polymerase and transcription factors to reach the
target DNA.
When it is tightly packed, chromatin is said to be
“closed” and unavailable for transcription. We see this
as heterochromatin.
Euchromatin is in an open conformation, accessible to
transcription factors and capable of being transcribed.
Facultative heterochromatin is DNA that is euchromatin
in some tissues but heterochromatin in others. This is
very typical of genes needed for the functioning of
specific cell types.
We will look at several mechanisms that affect
chromatin structure: chromatin remodeling, histone
modification, and DNA methylation. There are also
some alternate histone proteins that can affect
chromatin structure. Also, the position of the gene in the
interphase nucleus affects gene activity.
Histone Acetylation
•
•
•
•
Histones are basic proteins: lysines
have a + charge that is attracted to
the – charges on DNA phosphates.
Histone acetylases add acetate
(CH3COOH) to the NH2 at the end of
lysine. This removes the + charge,
and in consequence the histones are
less tightly bound to the DNA.
Genes in the region of acetylated
histones are active; non-acetylated
histones are associated with inactive
genes. The chromatin in areas of
acetylated histones is less
condensed.
Histone acetylases and deacetylases can be part of
transcriptional complexes, helping to
activate specific genes.
The Histone Code
•
•
•
Histones bind tightly to DNA
because the negative charges on
the DNA phosphates form ionic
bonds with the positively charged
lysines and arginines in the
histones.
Histone proteins have exposed N
and C termini. The lysines in
these tails are frequently
modified, which changes how
tightly histones bind to DNA and
to the many other proteins found
in chromatin.
The histone code is a theoretical
concept that proposes that
specific sets of histone
modifications define the
chromatin conformation and the
activity of the DNA. Probably
things aren’t as clearly defined as
this concept implies: there are
many factors involved in
chromatin conformation.
More Histone Code
H3 and H4 are histones; K stands for lysine; the number following the K is the position
on the protein.
Chromatin Remodeling
•
•
•
•
Moving nucleosomes around to
allow transcription factors to reach
the cis acting regulatory sites is
accomplished by large protein
structures called chromatin
remodeling complexes.
Remodeling slides nucleosomes
along the DNA, away from the
region of the promoter. The
process requires energy, so it
uses ATP.
The DNA exposed by moving
histones away is more accessible
for restriction enzymes and DNase
in the lab: DNase hypersensitive
sites are a sign of active genes.
Remodelling often occurs during
development, as cells
differentiate.
DNA Methylation
•
DNA methylation is the addition of methyl
groups to cytosine, creating 5-methyl
cytosine. In mammalian DNA this almost
always occurs when the C is followed by a
G: CpG.
•
DNA methylation is associated with
inactive genes, especially when it is near
the promoter. Specific proteins recognize
and bind to it, which alters the chromatin
configuration.
•
The methylation state of DNA is
maintained through mitosis: daughter cells
are methylated in the same way as the
parent cell. Methylation changes are thus
epigenetic changes: heritable changes that
don’t alter the DNA base sequence.
When DNA replicates, an enzyme called
maintainence methylase recognizes
methylated cytosines on the old strand (in
a CpG dinucleotide), and methylates the
corresponding C on the new strand.
•
DNA Methylation in Development
• DNA from sperm and egg are both heavily methylated, but at different
sites.
• Almost all methylation is removed in the early embryo (morula and
early blastocyst).
• As early development proceeds, new methylation patterns are
imposed on different cell lineages. These patterns permanently
inactivate some of the genes (at least for the life of the individual).
CpG Islands
•
•
•
In human DNA, the dinucleotide CpG is quite
rare.
– Note that this is a C followed by a G on
the same DNA strand, not C paired with
G on the opposite strand. The “p”
stands for a phosphodiester bond.
The rarity of CpG is tied up with DNA
methylation.
And, areas where there are many CpG
dinucleotides are often associated with the
promoter regions of genes. These areas are
CpG islands.
Why CpG is rare
• Cytosine spontaneously loses its amino
group, which converts it to uracil
(deoxyuracil actually).
• DNA repair enzymes notice this and
repairs it back to cytosine.
• However, when 5-methyl-cytosine is
deaminated, it is converted to thymidine.
• Since T is a legitimate base in DNA,
this change is not corrected.
• In human DNA, most CpGs are
methylated. So over evolutionary time
scales, most CpGs have been converted
to TpG.
• However, CpGs near promoters is
usually not methylated, so deaminations
are corrected back to CpG, and thus
CpG is more common near promoters
than elsewhere in the genome.
Epigenetics and Imprinting
• Epigenetics is the study of differences that are inherited between
generations but don’t involve changes in the DNA sequence.
– Sometimes epigenetics is used for changes that persist between cell
generations (mitosis), but I will use the term more strictly here, to mean
changes that are transmitted from parent to child.
• The concept predates our knowledge of DNA, but these days most
epigenetic changes involves DNA methylation.
• Imprinting refers to epigenetic changes where the activity of a gene
depends on whether it came from the father or the mother.
• Imprinting seems to be the major reason why uniparental diploid
(UPD) embryos do not produce viable offspring: some genes require
an active, unmethylated gene from the father while others need an
active gene from the mother.
– UPD from father is a hydatiform mole: extra-embryonic membranes but no
embryo, UPD from mother is an ovarian teratoma: a mass of disorganized tissue
that usually includes hair, teeth and bones.
Non-genic structural inheritance
Structures on the surface of ciliates can be altered and inherited
asexually for many generations. This is a form of epigenetic inheritance:
DNA mutations are not involved. These changes were created artificially.
A. Inversion of a row of cilia (BB=basal body). B. Siamese-twin paramecium,
with two contracile vacuole pores (CVP). C. Mirror image oral apparatuses (OA).
Frankel, J.. 2008. Eukaryotic Cell 7(10):1617-1639
Methylation and Imprinting
•
•
•
Prader-Willi syndrome and Angelman
syndrome are both caused by
deletions or uniparental disomy of 15q.
Most are caused by unequal crossing
over between two repeated sequences
that are 4.2 Mbp apart.
Prader-Willi results when only the
maternal gene is active, and Angelman
when only the paternal is active.
–
PWS is characterized by obesity due
to an insatiable appetite, small hands
and feet, short stature, and
hypogonadism. In addition, there is a
common behavioral phenotype,
including temper tantrums,
stubbornness, and controlling and
manipulative behavior.
– AS is characterized by severe mental
retardation, severe speech impairment,
and unsteady gait and/or
tremulousness of the limbs. In addition,
individuals with AS present with
inappropriate laughter and excitability.
Olfactory Receptor Genes
•
Some genes have only one allele
expressed, but not affected by which
parent they came from.
– An important case: immunoglobulins.
We will discuss them later.
•
•
•
•
We have about 900 olfactory receptor
genes: it is the largest gene family in
humans. Found in clusters on many
chromosomes.
In each receptor cell, only one copy of
1 gene is active.
This works by having a single copy of
a necessary enhancer (the copy on the
other chromosome is inactivated by
methylation).
The enhancer randomly associates
with the promoter of one receptor
gene, allowing it to be transcribed.