Transcript Proteins
From Sequences to Structure
Illustrations from: C Branden and J Tooze, Introduction to Protein Structure, 2nd ed. Garland Pub. ISBN 0815302703
Protein Functions
•Mechanoenzymes: myosin, actin
•Rhodopsin: allows vision
•Globins: transport oxygen
•Antibodies: immune system
•Enzymes: pepsin, renin, carboxypeptidase A
•Receptors: transmembrane signaling
•Vitelogenin: molecular velcro
–And hundreds of thousands more…
2
Proteins are Chains of Amino Acids
•Polymer – a molecule composed of repeating units
3
The Peptide Bond
•Dehydration synthesis
•Repeating backbone: N–C –C –N–C –C
O
O
–Convention – start at amino terminus and proceed to
carboxy terminus
4
Peptidyl polymers
•A few amino acids in a chain are called a
polypeptide. A protein is usually composed of 50
to 400+ amino acids.
•Since part of the amino acid is lost during
dehydration synthesis, we call the units of a
protein amino acid residues.
carbonyl
carbon
amide
nitrogen
5
Side Chain Properties
•Recall that the electronegativity of carbon is at
about the middle of the scale for light elements
–Carbon does not make hydrogen bonds with water
easily – hydrophobic
–O and N are generally more likely than C to h-bond to
water – hydrophilic
•We group the amino acids into three general
groups:
–Hydrophobic
–Charged (positive/basic & negative/acidic)
–Polar
6
The Hydrophobic Amino Acids
Proline severely
limits allowable
conformations!
7
The Charged Amino Acids
8
The Polar Amino Acids
9
More Polar Amino Acids
And then there’s…
10
Planarity of the Peptide Bond
11
Phi and psi
• = = 180° is extended
conformation
• : C to N–H
• : C=O to C
OCCBIO 2006 – Fundamental
Bioinformatics
12
The Ramachandran Plot
Observed
(non-glycine)
Calculated
Observed
(glycine)
•G. N. Ramachandran – first calculations of
sterically allowed regions of phi and psi
•Note the structural importance of glycine
13
Primary and Secondary Structure
•Primary structure = the linear sequence of amino
acids comprising a protein:
AGVGTVPMTAYGNDIQYYGQVT…
•Secondary structure
Regular patterns of hydrogen bonding in proteins result in
two patterns that emerge in nearly every protein structure
known: the -helix and the
-sheet
The location of direction of these periodic, repeating
structures is known as the secondary structure of the protein
14
The alpha Helix
60°
15
Properties of the alpha helix
• 60°
•Hydrogen bonds
between C=O of
residue n, and
NH of residue
n+4
•3.6 residues/turn
•1.5 Å/residue rise
•100°/residue turn
16
Properties of -helices
•4 – 40+ residues in length
•Often amphipathic or “dual-natured”
–Half hydrophobic and half hydrophilic
–Mostly when surface-exposed
•If we examine many -helices,
we find trends…
–Helix formers: Ala, Glu, Leu,
Met
–Helix breakers: Pro, Gly, Tyr,
Ser
17
The beta Strand (and Sheet)
135°
+135°
18
Properties of beta sheets
•Formed of stretches of 5-10 residues in
extended conformation
•Pleated – each C a bit
above or below the previous
•Parallel/aniparallel,
contiguous/non-contiguous
OCCBIO 2006 – Fundamental
Bioinformatics
19
Parallel and anti-parallel -sheets
•Anti-parallel is slightly energetically favored
Anti-parallel
Parallel
20
Turns and Loops
•Secondary structure
elements are connected by
regions of turns and loops
•Turns – short regions
of non-, non-
conformation
•Loops – larger stretches with
no secondary structure.
Often disordered.
•“Random coil”
•Sequences vary much more than
secondary structure regions
21
Levels of Protein
Structure
•Secondary structure
elements combine to
form tertiary structure
•Quaternary structure
occurs in multienzyme
complexes
–Many proteins are active
only as homodimers,
homotetramers, etc.
Disulfide Bonds
•Two cyteines in
close proximity will
form a covalent
bond
•Disulfide bond,
disulfide bridge, or
dicysteine bond.
•Significantly
stabilizes tertiary
structure.
23
Protein Structure Examples
24
Determining Protein Structure
•There are ~ 100,000 distinct proteins in the
human proteome.
•3D structures have been determined for 14,000
proteins, from all organisms
–Includes duplicates with different ligands bound, etc.
•Coordinates are determined by X-ray
crystallography
25
X-Ray diffraction
•Image is averaged
over:
–Space (many copies)
–Time (of the diffraction
experiment)
26
Electron Density Maps
•Resolution is
dependent on the
quality/regularity of
the crystal
•R-factor is a
measure of
“leftover” electron
density
•Solvent fitting
•Refinement
27
The Protein Data Bank
•http://www.rcsb.org/pdb/
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
N
CA
C
O
CB
N
CA
C
O
N
CA
C
O
CB
CG1
CG2
ALA
ALA
ALA
ALA
ALA
GLY
GLY
GLY
GLY
VAL
VAL
VAL
VAL
VAL
VAL
VAL
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
1
1
1
1
1
2
2
2
2
3
3
3
3
3
3
3
22.382
22.957
23.572
23.948
23.932
23.656
24.216
25.653
26.258
26.213
27.594
28.569
28.429
27.834
29.259
26.811
47.782
47.648
46.251
45.688
48.787
45.723
44.393
44.308
45.296
43.110
42.879
43.613
43.444
41.363
41.013
40.649
112.975
111.613
111.545
112.603
111.380
110.336
110.087
110.579
110.994
110.521
110.975
110.055
108.822
110.979
111.404
111.850
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
24.09
22.40
21.32
21.54
22.79
19.17
17.35
16.49
15.35
16.21
16.02
15.69
16.43
16.66
17.35
17.03
3APR
3APR
3APR
3APR
3APR
3APR
3APR
3APR
3APR
3APR
3APR
3APR
3APR
3APR
3APR
3APR
28
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
Views of a Protein
Wireframe
Ball and stick
29
Views of a Protein
Spacefill
Cartoon
CPK colors
Carbon = green, black
Nitrogen = blue
Oxygen = red
Sulfur = yellow
Hydrogen = white
30
The Protein Folding Problem
•Central question of molecular biology:
“Given a particular sequence of amino acid
residues (primary structure), what will the
tertiary/quaternary structure of the resulting
protein be?”
•Input: AAVIKYGCAL…
Output: 11, 22…
= backbone conformation:
(no side chains yet)
31
Forces Driving Protein Folding
•It is believed that hydrophobic collapse is a key
driving force for protein folding
–Hydrophobic core
–Polar surface interacting with solvent
•Minimum volume (no cavities)
•Disulfide bond formation stabilizes
•Hydrogen bonds
•Polar and electrostatic interactions
32
Folding Help
•Proteins are, in fact, only marginally stable
–Native state is typically only 5 to 10 kcal/mole more
stable than the unfolded form
•Many proteins help in folding
–Protein disulfide isomerase – catalyzes shuffling of
disulfide bonds
–Chaperones – break up aggregates and (in theory)
unfold misfolded proteins
33
The Hydrophobic Core
•Hemoglobin A is the protein in red blood cells
(erythrocytes) responsible for binding oxygen.
•The mutation E6V in the chain places a
hydrophobic Val on the surface of hemoglobin
•The resulting “sticky patch” causes hemoglobin S
to agglutinate (stick together) and form fibers
which deform the red blood cell and do not carry
oxygen efficiently
•Sickle cell anemia was the first identified
molecular disease
34
Sickle Cell Anemia
Sequestering hydrophobic residues in the protein core
protects proteins from hydrophobic agglutination.
35
Computational Problems in Protein Folding
•Two key questions:
–Evaluation – how can we tell a correctly-folded protein
from an incorrectly folded protein?
•H-bonds, electrostatics, hydrophobic effect, etc.
•Derive a function, see how well it does on “real” proteins
–Optimization – once we get an evaluation function, can
we optimize it?
•Simulated annealing/monte carlo
•EC
•Heuristics
36
Fold Optimization
•Simple lattice models (HPmodels)
–Two types of residues:
hydrophobic and polar
–2-D or 3-D lattice
–The only force is hydrophobic
collapse
–Score = number of HH contacts
37
Scoring Lattice Models
H/P model scoring: count noncovalent
hydrophobic interactions.
Sometimes:
Penalize for buried polar or surface hydrophobic
residues
38
What can we do with lattice models?
•For smaller polypeptides, exhaustive search can
be used
–Looking at the “best” fold, even in such a simple
model, can teach us interesting things about the protein
folding process
•For larger chains, other optimization and search
methods must be used
–Greedy, branch and bound
–Evolutionary computing, simulated annealing
–Graph theoretical methods
39
Learning from Lattice Models
The “hydrophobic zipper” effect:
Ken Dill ~ 1997
40
Representing a lattice model
Absolute directions
UURRDLDRRU
Relative directions
LFRFRRLLFFL
Advantage, we can’t have UD or RL in absolute
Only three directions: LRF
What about bumps? LFRRR
Bad score
Use a better representation
41
Preference-order representation
•Each position has two “preferences”
–If it can’t have either of the two, it will take the “least
favorite” path if possible
•Example: {LR},{FL},{RL},
{FR},{RL},{RL},{FR},{RF}
•Can still cause bumps:
{LF},{FR},{RL},{FL},
{RL},{FL},{RF},{RL},
{FL}
42
More Realistic Models
•Higher resolution lattices (45° lattice, etc.)
•Off-lattice models
–Local moves
–Optimization/search methods and / representations
•Greedy search
•Branch and bound
•EC, Monte Carlo, simulated annealing, etc.
43
The Other Half of the Picture
•Now that we have a more realistic off-lattice
model, we need a better energy function to
evaluate a conformation (fold).
•Theoretical force field:
G = Gvan der Waals + Gh-bonds + Gsolvent + Gcoulomb
•Empirical force fields
–Start with a database
–Look at neighboring residues – similar to known
protein folds?
44
Threading: Fold recognition
•Given:
–Sequence:
IVACIVSTEYDVMKAAR…
–A database of molecular
coordinates
•Map the sequence onto
each fold
•Evaluate
–Objective 1: improve scoring
function
–Objective 2: folding
45
Secondary Structure Prediction
AGVGTVPMTAYGNDIQYYGQVT
…
A-VGIVPM-AYGQDIQYAG-GIIP--AYGNELQ-GQVT…
AGVCTVPMTA---ELQYYG-GQVT…
T…
AGVGTVPMTAYGNDIQYYGQVT
----hhhHHHHHHhhh-…
eeEE…
46
Secondary Structure Prediction
•Easier than folding
–Current algorithms can prediction secondary structure
with 70-80% accuracy
•Chou, P.Y. & Fasman, G.D. (1974). Biochemistry,
13, 211-222.
–Based on frequencies of occurrence of residues in
helices and sheets
•PhD – Neural network based
–Uses a multiple sequence alignment
–Rost & Sander, Proteins, 1994 , 19, 55-72
47
Chou-Fasman Parameters
Nam
e
Alanine
Arginine
AsparticAcid
Asparagine
Cysteine
GlutamicAcid
Glutamine
Glycine
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Proline
Serine
Threonine
Tryptophan
Tyrosine
Valine
Abbrv
A
R
D
N
C
E
Q
G
H
I
L
K
M
F
P
S
T
W
Y
V
P(a)
142
98
101
67
70
151
111
57
100
108
121
114
145
113
57
77
83
108
69
106
P(b) P(turn)
83
66
93
95
54
146
89
156
119
119
37
74
110
98
75
156
87
95
160
47
130
59
74
101
105
60
138
60
55
152
75
143
119
96
137
96
147
114
170
50
f(i)
0.06
0.07
0.147
0.161
0.149
0.056
0.074
0.102
0.14
0.043
0.061
0.055
0.068
0.059
0.102
0.12
0.086
0.077
0.082
0.062
f(i+1)
0.076
0.106
0.11
0.083
0.05
0.06
0.098
0.085
0.047
0.034
0.025
0.115
0.082
0.041
0.301
0.139
0.108
0.013
0.065
0.048
f(i+2)
0.035
0.099
0.179
0.191
0.117
0.077
0.037
0.19
0.093
0.013
0.036
0.072
0.014
0.065
0.034
0.125
0.065
0.064
0.114
0.028
f(i+3)
0.058
0.085
0.081
0.091
0.128
0.064
0.098
0.152
0.054
0.056
0.07
0.095
0.055
0.065
0.068
0.106
0.079
0.167
0.125
0.053
48
Chou-Fasman Algorithm
•Identify -helices
–4 out of 6 contiguous amino acids that have P(a) > 100
–Extend the region until 4 amino acids with P(a) < 100
found
–Compute P(a) and P(b); If the region is >5 residues
and P(a) > P(b) identify as a helix
•Repeat for -sheets [use P(b)]
•If an and a region overlap, the overlapping
region is predicted according to P(a) and P(b)
49
Chou-Fasman, cont’d
•Identify hairpin turns:
–P(t) = f(i) of the residue f(i+1) of the next residue
f(i+2) of the following residue f(i+3) of the residue at
position (i+3)
–Predict a hairpin turn starting at positions where:
•P(t) > 0.000075
•The average P(turn) for the four residues > 100
•P(a) < P(turn) > P(b) for the four residues
•Accuracy 60-65%
50
Chou-Fasman Example
•CAENKLDHVRGPTCILFMTWYNDGP
•CAENKL – Potential helix (!C and !N)
•Residues with P(a) < 100: RNCGPSTY
–Extend: When we reach RGPT, we must stop
–CAENKLDHV: P(a) = 972, P(b) = 843
–Declare alpha helix
•Identifying a hairpin turn
–VRGP: P(t) = 0.000085
–Average P(turn) = 113.25
•Avg P(a) = 79.5, Avg P(b) = 98.25
51
Lots More to Come
•Microarray analysis
•Mass Spectrometry
•Interactions/ Knockouts
•Synthetic Lethality
•RPPA
•.....
52