Transcript Document

Clase # 14.
Modelado de la estructura de una
proteína (II)
Prof. Ramón Garduño Juárez
Modelado Molecular
Diseño de Fármacos
Descripción
•
•
•
•
Protein structure
Secondary structure prediction
Protein folding
Tertiary structure prediction
– ab initio structure predictions
– Homology modeling
– Fold recognition
Protein Structure
APRKFFVGGNWKMNGDKKSLG
ELIHTLNGAKLSADTEVVCGA
PSIYLDFARQKLDAKIGVAAQ
NCYKVPKGAFTGEISPAMIKD
IGAAWVILGHSERRHVFGESD
ELIGQKVAHALAEGLGVIACI
GEKLDEREAGITEKVVFEQTK
AIADNVKDWSKVVLAYEPVWA
IGTGKTATPQQAQEVHEKLRG
WLKSHVSDAVAQSTRIIYGGS
VTGGNCKELASQHDVDGFLVG
GASLKPEFVDIINAKH
=
Protein secondary structures
Alpha-helix:
• Right-handed helix
• 3.6 residues per helix turn
• Hydrogen bond between n and n+4
Beta strand and beta sheet
Side Chain Conformation
•
The side chain atoms of amino
acids are named in the Greek
alphabet according to this
scheme.
•
The side chain torsion angles
are named chi1, chi2, chi3,
etc., as shown below for lysine.
Secondary structure prediction
• Rule-based approach
• Each residue is assigned to one of the three classes: alpha,
beta, coil.
• Propensity: each of the 20 amino acid is assigned a
probability of being alpha, beta, and coil.
• Some straightforward observations: HHPPHHPP might be
alpha; HPHPHPHP might be beta.
• Neural network model and information theory are used.
The success rate is 65-70%
• When using multiple sequence alignment, the rate can be
improved to > 70%
• In CASP4 and CASP5, PHD can achieve 80% accuracy.
Secondary structure prediction: history
• 1974. Chou and Fasman propose a statistical method based on the propensities of amino acids to
adopt secondary structures based on the observation of their location in 15 protein structures
determined by X-ray diffraction. Clearly these statistics derive from the particular stereochemical
and physicochemical properties of the amino acids. Rather than a position by position analysis the
propensity of a position is calculated using an average over 5 or 6 residues surrounding each
position. On a larger set of 62 proteins the base method reports a success rate of 50%. (page 446)
•1978 Garnier improved the method by using statistically significant pair-wise interactions as a
determinant of the statistical significance. This improved the success rate to 62% (page 447)
•1993 Levin improved the prediction level by using multiple sequence alignments. The reasoning is
as follows. Conserved regions in a multiple sequence alignment provides a strong evolutionary
indicator of a role in the function of the protein. Those regions are also likely to have conserved
structure, including secondary structure and strengthen the prediction by their joint propensities.
This improved the success rate to 69%.
•1994 Rost and Sander combined neural networks with multiple sequence alignments. The idea of a
neural net is to create a complex network of interconnected nodes, where progress from one node to
the next depends on satisfying a weighted function that has been derived by training the net with
data of known results, in this case protein sequences with known secondary structures. The success
rate is 72%. (page 450)
Chou-Fasman
• Calculate the frequency of each of the 20 amino
acid in helix, sheet, and turns.
• Frequency of i in structure s is divided by the
frequency of all residues in structure s.
• First scan the sequence to find a short piece of AA
that have high probability of helix or sheet (4/6 for
alpha, 3/5 for beta when score is larger than 1)
• Then extend the pieces until prediction values for
four AA drops below 1.
• Turns are predicted:
– The score for 4 AA in turn is larger than in helix and
sheet
– Position dependent score in turn is larger than
7.5*10E(-5)
Protein folding and unfolding
• Livinthal paradox: By enumeration, a 100-residue protein
needs 10^29 years to find its native structure. Certain
pathways should exist to guide the folding
• Lattice model and atomic model
• Atomic model is used mostly in unfolding simulations
• Folding @ home
• IBM Blue Genes
• “new view” of protein folding (Peter Wolynes) through
funnels in the energy landscape
• Not all sequences have unique native structure
Protein Folding
• As proteins are formed from RNA templates, they are defined as long
polypeptide chains with specific amino acid sequences that fold into threedimensional bundles whose structure governs their function.
– In living organisms, the specific steps of the folding process have been hard
to discern experimentally and characterize theoretically.
– It seems that all the information needed to get to a precise three-dimensional
shape is "in there already," contained in the one-dimensional amino acid
sequence.
• Protein structures are determined by a large number of conflicting and largely
canceling forces exerted on the protein residues by the surrounding solvent and
other residues in the protein chain.
– Hydrophobic, entropic, electrostatic, vdW, etc.
– In one type of protein, globular protein, the protein molecule can
spontaneously and reproducibly fold to a compact, well defined structure.
Ab initio Prediction of Protein Structure
•
•
Need to find a potential function where E(S, Cnative) < E(S, Cnon_native) holds.
Need to construct an algorithm to find the global minimum of this function.
– Unsolved problems.
Levinthal’s Paradox: For a protein with N residues, the size of its
conformational space is about 10N states.
– Assume the main chain conformation of a protein is adequately
represented by 10 torsion angles.
– Neglecting all the side chain conformation.
– For a chain of 100 residues, no physically achievable search algorithm
would enable it to complete its folding process.
• If the atoms can move in light speed, it takes 1082 seconds, but the age
of the universe is estimated as only 1017 seconds.
•
Protein does not fold by searching the entire conformational space.
•
•
Are there folding pathways?
Could proteins exist in metastable states?
Bovine Pancreatic Trypsin Inhibitor (BPTI)
•
BPTI is composed of 58 amino acid residues folded into a single
compact domain.
– The folded conformation is stabilized by three disulfide bonds, and
reducing all three disulfides leads to nearly complete unfolding.
– In the absence of any one of the disulfides, however, the protein
retains nearly all of its folded structure but is significantly
destabilized.
•
BPTI can be unfolded by reducing its disulfides and can then be
refolded upon the addition of an appropriate oxidant, such as the
disulfide forms of glutathione or dithiothreitol.
– The folding pathway was analyzed by chemically trapping and
analyzing intermediate species containing one or two disulfide
bonds (Creighton).
– After trapping, the intermediates were physically separated by
chromatography and their disulfide bonds identified by peptide
mapping.
– The rates of interconversion among the various species were
analyzed to determine a kinetic mechanism.
•
The pathway shown below has been derived.
The various intermediates are identified by the disulfide bonds they contain.
Thus, the [30-51,5-14] intermediate contains two disulfide bonds, linking
cysteines 30 and 51 and cysteines 5 and 14. Each of the major intermediates
shown in the pathway, or an analog of the intermediate, has been studied by
NMR spectroscopy (eg. in the laboratories of T.E. Creighton, P.S. Kim and
C.K. Woodward), and the schematic representations of the intermediates are
drawn to indicate qualitatively the extent to which they contain structure found
in the native protein.
Mutational analysis of protein folding
• One or more amino acid residues can be replaced to alter
interactions. By measuring the effects of these changes on the
native protein and folding intermediates, the roles of the
altered residues at various stages of folding can be inferred.
– Amino acid replacements at different sites can have quite
different effects on the stabilities of the various
intermediates.
– Mutations generally have small effects.
Phi-value
analysis
Computer modeling
such as molecular dynamics
can be used in
generating transition
state structure, and in
phi-value calculation
Protein Folding Landscape Theory
(Wolynes, Onuchic, Dill, Chan, Sali, Karplus, Brooks etc)
Proteins fold on timescales ranging from a
microsecond to a few minutes, so they
obviously drive or are driven quickly
toward the native state.
•
Folding can be described as the descent
of the folding chain down a 'folding
funnel,' with local roughness of the
funnel reflecting the potential for
transient trapping in local minima and
the overall slope of the funnel
representing the thermodynamic drive to
the native state.
•
A key notion is, in all but the final stages
of folding, there exists an ensemble of
structures--protein folding consequently
occurs via multiple pathways.
or, Funnel Theory
•
There cannot be a single pathway.
– if an ensemble of denatured proteins all must pass
through a single narrow pathway in their phase space,
then there must be a large reduction in entropy upon
entering this path. This step would consequently be
very unlikely and rate limiting.
– It is much more likely that proteins fold via many
different pathways.
•
This picture of protein folding dynamics, while similar to
classical transition state theory, is different in spirit.
– In this picture of two state systems, the barrier is a
free energy barrier: an energetic barrier does not exist.
– The transition state is composed of a broad ensemble
of structures rather than one particular structure.
– This does not mean that the transition state is
completely random. The transition state may be
characterized by partial structure in the form of stable
pieces of secondary structure or partially correct
backbone shape.
•
Protein folding can be given a lower order description as a quasi-static evolution
of an ensemble of equilibrium structures from the denatured state to the native
state over a relatively modest free energy barrier.
– By grouping protein states according to a reaction coordinate, this process
can thus be re-expressed as a diffusion equation.
•
What constitutes a well-designed protein (well designed as a folder, not well
designed functionally)?
– Well designed proteins are those proteins which have a large diffusion
constant at temperatures below which the native state is stable and populated.
– That is, well designed proteins can quickly find the native state. (Of course,
for real proteins this temperature must also be between zero and one hundred
C.)
– Probably one reason why a protein's native state is only marginally stable is
that greater stability of the native state would result in less specific residueresidue interactions leading to a much lower diffusion constant and difficulty
folding.
– This problem is evident in proteins with disulfur bridges: misformed disulfur
bridges lead to a slowing down in the folding process.
•
A simplified latice model system of 27 beads (Sali et al, 1994). In this model, protein
folding was simulated and the lowest energy state could be identified by enumerating
various states, and by calculating explicit contacts along the lattice.
– 200 random sequences of the 27 beads are generated and subjected each to the
simulation.
– The number density of different folds was calculated versus energy, entropy, and
free energy at incremental stages of folding from Q=0 to Q=1 (Q=#correct
contacts/total contacts).
– This data generated a two tiered time frame that modeled protein folding.
• First, a protein quickly progresses from a huge number of conformations
(1016) to a much smaller population (1010).
• These structures then progress through a slow, rate limiting step until one of
the 103 native-like structures is found (Q>0.8).
• Once a suitable structure is found, the protein quickly folds into the native
form.
•
The topology of the protein native state
appears to influence the folding mechanism.
– In all-alpha proteins, the formation of
the tertiary (native) structure occurs
concurrently with the formation of
secondary structure.
– In a mixed a/ß protein, a general
collapse (reduction in radius of
gyration) occurs first, followed by
evolution toward the native state.
•
The corresponding diagrams show the
folding landscapes in terms of the radius of
gyration (in angstroms, vertical axis)
versus the fraction of native contacts
(horizontal axis).
– The all-alpha protein finds its native
structure straightforwardly;
– the others "collapse" in two stages, as
shown by the L-shaped landscapes.
Quiz 5
• In MD simulation, if one wants to raise the
temperature from 0k to 300k, in a total of
100,000 timesteps, give two algorithms to
achieve this. Give detailed steps (such as
from which step to which step, do what
calculation, etc).