Transcript amino acid
Proteins, Enzymes, Biochemistry
Sept. 21, 2001
Duncan MacCannel: Historical Perspective on Molecular Biology / Genetics
Background
The Thread of Life. Susan Aldridge. Chapter 2
Molecular Biology of the Cell. Alberts et al. Garland Press
Suggested further reading
• Protein molecules as computational elements in living cells. D. Bray.
Nature. 1995 Jul 27;376(6538):307-12.
• Signaling complexes: biophysical constraints on intracellular
communication. D. Bray. Annu Rev Biophys Biomol Struct. 1998;27:59-75.
• Metabolic modeling of microbial strains in silico. Ms W. Covert, et al.
Trends in Biochemical Sciences Vol.26 ( 2001). 179-186.
• Modelling cellular behaviour. D. Endy & R. Brent. Nature(2001) 409: 391395.
A - Introduction to Proteins / Translation
• The primary structure is defined as the sequence of amino acids in the
protein. This is determined by and is co-linear to the sequence of bases
(triplet codons) in the gene*.
DNA
5’---CTCAGCGTTACCAT---3’
3’---GAGTCGCAATGGTA---5’
transcription
RNA
5’---CUCAGCGUUACCAU---3’
translation
PROTEIN N---Leu-Ser-Val-Thr---C
* - this is not strictly true in most eukaryotic genomes
Structure of Genes In Eukaryotic Organisms
Transcription
hnRNA
heterogeneous nuclear RNA
RNA splicing
mRNA
Structure of Genes In Eukaryotic Organisms
Introns
Transcription
hnRNA
heterogeneous nuclear RNA
RNA splicing
mRNA
Exons
Structure of Genes In Eukaryotic Organisms
Transcription
hnRNA
heterogeneous nuclear RNA
RNA splicing
Alternative
RNA splicing
mRNA
mRNA
Structure of Genes In Eukaryotic Organisms
Control Elements
Transcription
hnRNA
heterogeneous nuclear RNA
RNA splicing
mRNA
Structure of Genes In Eukaryotic Organisms
• Coding sequence can be discontinuous and the gene can be composed of
many introns and exons.
• The control regions (= operators) can be spread over a large region of
DNA and exert action-at-a-distance.
• There can be many different regulators acting on a single gene – i.e. more
signal integration than in bacteria.
• Alternate splicing can give rise to more than one protein product from a
single ‘gene’.
• Predicting genes (introns, exons and proper splicing) is very challenging.
• Because the control elements can be spread over a large segment of DNA,
predicting the important sites and their effects on gene expression are not
very feasible at this time.
Schematic Illustration of Transcription
The nucleotides in an mRNA are joined together to form
a complementary copy of the DNA sequence.
Translation
• Translation is the synthesis of a polypeptide (protein) chain using the mRNA template.
• Note the mRNA has directionality and is read from the 5’end towards the 3’end.
• The 5’end is defined at the DNA level by the promoter but this does not define the
translation start.
• The translation start sets the ‘register’ or reading frame for the message.
• The end is determined by the presence of a STOP codon (in the correct reading frame).
Note that many ribosomes can read one message like beads on a
string generating many polypeptide chains simultaneously.
Schematic Illustration of Translation
Protein Synthesis involves specialized RNA molecules called transfer RNA
or tRNA.
Translation Start Position
The translation start is dependent on:
1) a sequence motif called a ribosome binding site (rbs)
2) an AUG start codon 5-10 bp downstream from the rbs
3’end of 16S rRNA
3’AU
//-5’
UCCUCA
||||||
5’-NNNNNNNAGGAGU-N5-10-AUG-//-3’
mRNA
rbs
start
In bacteria a single mRNA molecule can code for several proteins. Such
messages are said to be polycistronic. Since the message for all genes in
such a transcript are present at the same concentration (they are on the same
molecule), one might predict that translation levels will be the same for all the
genes. This is not the case: translation efficiency can vary for the different
messages within a transcript.
Promoter
(Start)
Terminator
(Stop)
Gene 1
Gene 2
Gene 3
DNA
mRNA
4 genes , 1 message
Gene 4
Translation Efficiency is an important part of gene expression
Polycistronic mRNA
Translation
Tar
Tap
R
B
5000
1000
<100
1000
Y
Z
18000 10000
(Protein monomer per cell)
A single mRNA may encode several proteins. The final level of each
protein may vary significantly and is a function of:
1) translation efficiency
2) protein stability
B – Introduction to Proteins / Characteristics
• The primary structure is defined as the sequence of amino acids in the
protein. This is determined by and is co-linear to the sequence of bases
(triplet codons) in the gene*.
DNA
5’---CTCAGCGTTACCAT---3’
3’---GAGTCGCAATGGTA---5’
transcription
RNA
5’---CUCAGCGUUACCAU---3’
translation
PROTEIN N---Leu-Ser-Val-Thr---C
* - this is not strictly true in most eukaryotic genomes
There are 20 naturally occurring amino acids in proteins, each with
distinctive ‘side chains’ that give them characteristic chemical properties.
amino group
carboxylic acid
O
H2N
CH
C
CH3
amino acid
(alanine)
OH
There are 20 naturally occurring amino acids in proteins, each with
distinctive ‘side chains’ that give them characteristic chemical properties.
amino group
carboxylic acid
O
H2N
CH
C
OH
CH3
a-carbon
amino acid
(alanine)
Amino acids differ in the side chains on the a-carbon.
There are 20 naturally occurring amino acids in proteins, each with
distinctive ‘side chains’ that give them characteristic chemical properties.
amino group
carboxylic acid
O
H2N
CH
C
OH
CH3
a-carbon
amino acid
(alanine)
-CH3 (methyl)
Amino acids differ in the side chains on the a-carbon.
O
H2N
CH
C
OH
O
H2N
CH
C
OH
+
CH2
Alanine
+
Tyrptophan
(ala)
(A)
+
+
(trp)
(W)
CH3
HN
H2O
O
O
H2 N
CH
C
CH
H
N
CH2
CH3
HN
peptide bond
C
OH
Dipeptide
(Ala-Trp)
By convention polypeptides are
written from the N-terminus (amino)
to the C-terminus (carboxy)
Alanine
Arginine
Asparagine
Aspartic acid
Cysteine
Glutamine
Glutamic acid
Glycine
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Proline
Serine
Threonine
Tryptophan
Tyrosine
Valine
ala
arg
asn
asp
cys
gln
glu
gly
his
ile
leu
lys
met
phe
pro
ser
thr
trp
tyr
val
A
R
N
D
C
Q
E
G
H
I
L
K
M
F
P
S
T
W
Y
V
O
H2N
CH
C
OH
Glycine
H
O
C
Proline
HN
O
H2N
CH
C
OH
CH2
Cysteine
SH
OH
The Newly Synthesized Polypeptide
• The information from DNARNAProtein is linear and the final
polypeptide synthesized will have a sequence of amino acids defined by
the sequence of codons in the message.
• The sequence of amino acids is called the primary structure.
• Secondary structure refers to local regular/repeating structural elements.
• The folded three dimensional structure is referred to as tertiary structure.
Protein function depends on an ordered / defined three
dimensional folding. The final three dimensional folded state of the protein
is an intrinsic property of the primary sequence. How the primary
sequence defines the final folded conformation is generally referred to as
the Protein Folding Problem.
Primary structure of green fluorescent protein
(single letter AA codes)
SEQUENCE 238AA
26886MW
MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLP
VPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYK
TRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNG
IKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEK
RDHMVLLEFVTAAGITHGMDELYK
The primary sequence can be derived directly from the gene sequence but
going from sequence to structure or sequence to function is not possible
unless there is a related protein for which structure or function is known.
Likewise, the structure alone rarely provides information about function
(only if the function of a related protein is known).
Projections of the Tertiary Structure of Green Fluorescent Protein
Backbone tracing
Projections of the Tertiary Structure of Green Fluorescent Protein
Ile188-Gly189-Asp190-Gly191-Pro192-Val193
Backbone tracing
Projections of the Tertiary Structure of Green Fluorescent Protein
“Ribbon diagram” showing
secondary structures
Projections of the Tertiary Structure of Green Fluorescent Protein
Secondary structures
a-helix
“Ribbon diagram” showing
secondary structures
Projections of the Tertiary Structure of Green Fluorescent Protein
Secondary structures
a-helix
b-strand
“Ribbon diagram” showing
secondary structures
Projections of the Tertiary Structure of Green Fluorescent Protein
Ile188-Gly189-Asp190-Gly191-Pro192-Val193
“Wireframe” model showing
all atoms and chemical bonds.
Projections of the Tertiary Structure of Green Fluorescent Protein
“Stick” model showing all
atoms and chemical bonds.
“Space filling” model where each atom
is represented as a sphere of its Van
der Waals radius.
The final folded three dimensional (tertiary) structure is an
intrinsic property of the primary structure.
Primary structure
MSKGEELFTGVVPILV
ELDGDVNGHKFSVSG
EGEGDATYGKLTLKFI
CTTGKLPVPWPTLVTT
FSYGVQCFSRYPDHM
KQHDFFKSAMPEGYV
QERTIFFKDDGNYKTR
AEVKFEGDTLVNRIEL
KGIDFKEDGNILGHKL
EYNYNSHNVYIMADK
QKNGIKVNFKIRHNIE
DGSVQLADHYQQNTP
IGDGPVLLPDNHYLST
QSALSKDPNEKRDHM
VLLEFVTAAGITHGM
DELY
Tertiary Structure
“folding”
“denaturation”
Random Coil
“Denatured”
“Unfolded”
“Native”
“Folded”
In general, proteins are unstable outside of the cell
and very sensitive for solvent conditions.
Active site - the region of a protein (enzyme) to which a substrate molecule
binds.
• The active site is formed by the three dimensional folding of the peptide
backbone and amino acid side chains. (lock and key / induced fit)
• The active site is highly specific in binding interactions (stereochemical
specificity).
The three dimensional structure of CAP and the cAMP ligand-binding site
(Figures 3-45 and 3-55 from Alberts)
Conformational Change in Protein Structure
Proteins can undergo changes in their three dimensional structure in
response to changing conditions or interactions with other molecules.
This usually alters the ‘activity’ of the protein.
Conformational Change in Protein Structure
Proteins can undergo changes in their three dimensional structure in
response to changing conditions or interactions with other molecules.
This usually alters the ‘activity’ of the protein.
Binding of the substrate (glucose) cause the protein (hexokinase)
to shift from an open to closed conformation. (Fig. 5-2, Alberts)
C - Introduction to Proteins / Protein Functions
Proteins carry out a wide variety of functions in, on and outside the cell. For
the purpose of this course, we will generalize these functions into three
categories. These are not mutually exclusive and many proteins fit into more
than one of these categories.
1 - Structural
2 - Enzymatic
3 - Signal Transduction (information processing)
C1 - Protein Functions: Structural
Proteins can form large complexes that function primarily as structural elements:
Protein coats of viruses. These are large, regular repeating structures
composed of 100-1000’s of protein subunits. (Figs 6-74 and 6-72, Alberts).
Electron micrographs of A) Phage T4, B) potato virus X, C) adenovirus, D) influenza virus.
SV40 structure determined by X-ray crystallography.
Cytoskeleton in eukaryotic cells is responsible not only for determining shape
but also in cell movement, mechanical sensing, intracellular trafficking and
cell division.
A human cell grown in tissue culture
and stained for protein (such that only
large regular structures are highlighted).
Note the variety of structures (Fig 16-1,
Alberts)
Microtubules form by the polymerization of tubulin subunits. Whether the
polymer grows or shrinks is influenced conditions in the cell - Dynamic Instability
(Fig 16-33, Alberts; for discussion of dynamic instability see Flyvbjerg H, Holy TE,
Leibler S. Stochastic dynamics of microtubules: A model for caps and catastrophes.
Phys Rev Lett. 1994 Oct 24;73(17):2372-2375.
C2 - Protein Functions: Enzymatic
Enzyme: a protein* that catalyzes a chemical reaction, where a catalyst is
defined as a substance that accelerates a chemical reaction without itself
undergoing change.
* some RNA molecules can also be considered enzymes
A
A +B
X
Y
B
C +D
• Specificity
• Accelerated reaction rates
• Control (regulation)
• Enzymes can only affect the rate (kinetics) of a reaction, they can not
make a reaction more energetically favorable.
• Enzymes can be saturated by substrate.
Basics of Enzyme Kinetics
Michaelis-Menton Kinetics - for a simple enzyme reaction, the interaction of
enzyme and substrate is considered an equilibrium and the overall reaction as
follows:
k+1
E+S
ES
k-1
v=
Vs
(KM + s)
v = velocity, reaction rate
KM = Michaelis constant
KM =
k2 + k-1
k1
k+2
E+P
C3 - Protein Functions: Signal Transduction
Signal Transduction
- in general the relaying of a signal from one physical form to another
- in biological terms, the process by which a cell responds to signals (can be
intracellular, extracellular).
Input
Signal
Transduction
Examples of ‘signals’ (inputs):
• chemicals
• light
• temperature
• electrical (ion gradients)
• other cells (cell-cell contact)
• mechanical sensing
Output
Generalized Model of Response to Extracellular Signal
Ligand
Activated
Receptor
Receptor
“Action”
• Ligand can activate or inactivate receptor
• Output (action) dependent on system and sometime cell type
• In metazoans (multi-cellular eukaryotes), there are about 16 intercellular
classes of signaling systems
Example 1: Transmembrane Tyrosine Kinase Receptors
Ligand
Receptor
Activated
Receptor
P~
~P
“Action”
• Ligand binding results in receptor dimerization
• The cytoplasmic (intracellular) domains are tyrosine kinases which
phosphorylate each other on Tyr residue side chains.
• This sets off a series of intracellular events
Example 2 : Steroid Receptors
Ligand
Activated
Receptor
Receptor
nucleus
• The steroid binds to it’s receptor in the cytoplasm.
• The steroid-receptor complex but not the free receptor can move into the
nucleus .
• The steroid-receptor complex binds to specific binding site(s) on the DNA
to regulate gene expression.
Example 3. Heterotrimeric G-Proteins
Ligand
Activated
Receptor
GTP
GDP
Receptor
(a b g complex)
GDP
GTP
GTP
• Ligand binding causes activation of the a subunit which promotes
exchange of GDP for GTP
• In the GTP form, the a subunit and the associated bg subunits dissociate
from the complex.
• Each subunit can go on to initiate a series of intracellular events.
D - Regulation of Protein Activity
The concentration of a protein in the cell is a function of the rate of synthesis
and the rate of degradation. Both these processes can be regulated.
Synthesis
Transcription
Translation
DNA
RNA
Degradation
Protein
Proteins are often regulated such that the ‘activity’ of a protein is not a
constant function of its concentration.
Protein Active
Protein Inactive
Regulation of Enzyme Activity
Negative Feedback
(Product Inhibition)
A
X
X
A
B
C
B
D
E
F
Mechanistically negative feedback can be by direct competition of the
product with the substrate for the active site or it can be indirect through
interaction wit the enzyme away from the active site.
Regulation of Enzyme Activity
Positive Feedback
(Product Inhibition)
A
Positive Feedforward
A
X
X
B
B
Cooperativity / Allosteric Regulation
Hypothetical examples of binding of a ligand to a dimeric protein. The
binding curve is very sensitive to the effects on one site on the other.
Two independent sites
+
+
Cooperativity / Allosteric Regulation
Hypothetical examples of binding of a ligand to a dimeric protein. The
binding curve is very sensitive to the effects on one site on the other.
Two independent sites
+
+
Positive cooperativity
+
+
Cooperativity / Allosteric Regulation
Hypothetical examples of binding of a ligand to a dimeric protein. The
binding curve is very sensitive to the effects on one site on the other.
Two independent sites
+
+
Positive cooperativity
+
+
Negative cooperativity
+
+
Cooperativity / Allosteric Regulation
Hypothetical examples of binding of a ligand to a dimeric protein. The
binding curve is very sensitive to the effects on one site on the other.
n,1
Fraction bound vs ligand concentration
1
Two independent sites
0.75
+
+
0.5
0.25
Positive cooperativity
0
0.01
+
+
Negative cooperativity
+
+
1
100
10000
Cooperativity / Allosteric Regulation
Hypothetical examples of binding of a ligand to a dimeric protein. The
binding curve is very sensitive to the effects on one site on the other.
1
Two independent sites
+
0.75
+
0.5
0.25
Positive cooperativity
0
0.01
+
1
100
10000
+
Positive Cooperativity
(n=2, n=3)
Negative cooperativity
+
+
Cooperativity / Allosteric Regulation
Hypothetical examples of binding of a ligand to a dimeric protein. The
binding curve is very sensitive to the effects on one site on the other.
1
Two independent sites
+
0.75
+
0.5
0.25
Positive cooperativity
0
0.01
+
1
100
10000
+
Negative Cooperativity
(n= 0.5)
Negative cooperativity
+
+
Allosteric protein: a protein that changes from one conformation to another upon
binding a ligand or when it is covalently (chemically) modified. The change in
conformation alters the activity of the protein. Historically considered with
multi-meric proteins (e.g. hemoglobin).
Allosteric effector
(positive)
Ligand
Regulation of Protein Activity by Covalent Modification
The activity of a protein can modified by addition or removal of a chemical
group to an amino acid side chain (i.e. - as a substrate for another enzyme).
The most common modifications are:
• Methylation (-CH3)
• Phosphorylation (-PO3)
• Nucleotidyl
• Fatty acid
• Myristol
note that many proteins are modified in other ways such as addition of sugar groups
(glycosylation) but these are not ‘regulatory’ modifications.
Phosphorylation is the most common mechanism of regulation by
covalent modification
Kinase - an enzyme that phosphorylates
Phosphatase - an enzyme that removes phosphate
Regulation by Localization
Protein activity can be regulated by changing the localization of the protein. This
turns out to be a common theme in eukaryotic signal transduction.
Localization can be altered allosterically or by covalent modification.
P~
~P
P~
Addition of a fatty acid group can cause a
cytoplasmic protein to associate with the
cell membrane.
~P
Covalent modification of a protein can
generate a binding site for another protein.
E - General Considerations
Proteins have a diverse range of functions and a variety of mechanisms
of regulation. The ability to form networks of proteins acting on proteins,
the sharing of common reaction intermediates and forming multi-step
chemical pathways allows for an endless number of possibilities.
Some general considerations about protein systems:
• A reaction can behave as a step function (digital, boolean) if there is
significant cooperativity in the system or if there modifying enzyme that works
near saturation.
• Since proteins can act in a catalytic manner, there can be signal amplification.
• Many systems are adaptive, in that the response to signal is not necessarily
constant over time (e.g. a signal transduction system may become desensitized
and no loner respond to the presence of a ligand- c.f. heterotrimeric G protein).
EnvZ/OmpR system in E. coli bacteria
EnvZ is a histidine kinase (phosphorylates
specific histidine residues) in response to
changes in osmolarity (salt concentration).
The ~P group is transferred to OmpR to
form OmpR~P. EnvZ also catalyzes the
dephosphorylation of OmpR~P.
Increasing Osmolarity
EnvZ
~P
OmpR~P is a transcriptional regulator of
two gene (ompF and ompC). It binds to
DNA only in the phosphorylated state.
OmpR
EnvZ
~P
OmpR~P can activate or repress expression of a gene depending on the
position of the binding site relative to the promoter.
X
~P
~P
OFF
ON
Activation and repression of the ompF promoter is regulated by a high affinity
and a low affinity binding site respectively. Activation of ompC is through a
low affinity activator site.
+ -
ompF
+
ompC
Note that OmpR~P is required for both ompF and ompC transcription.
Low osmolarity
+ ~P
+
ompF
High osmolarity
~P
+ -
ON
~P
~P
+
ompC OFF
~P
ompF OFF
ompC
ON
OmpR~P
OmpC
Protein
Level
Not an ON/OFF switch but
more like a thermostat (i.e.
gradients of expression
levels).
OmpF
Osmolarity
Playing with Switches
Increasing Signal
[output signal]
Receptor
~P
Regulator
~P
[Signal]
Linear dependence
Playing with Switches
Increasing Signal
[output signal]
Receptor
~P
Regulator
~P
[Signal]
Linear dependence
Adding Cooperativity
Playing with Switches
Increasing Signal
[output signal]
Receptor
~P
Regulator
~P
[Signal]
Linear dependence
Adding Cooperativity
Adding More Cooperativity
Playing with Switches
Increasing Signal
[output signal]
Receptor
~P
Regulator
~P
[Signal]
Approximates a step function
(ON/OFF Switch)
Epidermal Growth Factor
Signaling Pathway
Not as bad as it looks!
Not all pathways will operate
in a single cell.
• Protein interactions
• Protein modification
(Activation/inhibition)
• Protein re-localization
• Transcriptional regulation
http://www.grt.kyushu-u.ac.jp/spad/pathway/egf.html