Transcript ppt
Protein Structure
Prediction
What is PSP ?
Primary sequence (1D)
Tertiary Structure (3D)
…ACLLYYTTCAT…
all bonds angles, dihedral angles
and bond lengths between each
amino acid residue in protein
“Solving” PSP
View PSP as a search
Given any primary sequence of an unknown protein (in
the sense of it’s 3D structure)
Consider PSP as performing a search through the
configuration space of the given protein
The space of
different
configurations
Steps in solving PSP
Given primary sequence predict the final 3D structure
(1D
3D).
2 Step process (1D 2D, 2D 3D)
1st find configuration for the secondary structure (SS Prediction)
2nd find configuration for the side-chains (side-chain
conformation)
Required “components” in
solving PSP
All methods require the definition of a protein
model
• A simplified protein structure model
• A potential energy function
Simplified structure model of
Protein
By the above we mean that the protein in question
has simpler physical properties then an actual protein
This is needed as trying to solve PSP is too complex
for real proteins
Good simplified models give a good approximation for
the actual shape of the protein
Determining a good model is a research area by itself
Example of Simplified Model of
Protein
Simplified model of
the protein backbone
“Actual” model of the
protein backbone
3 dihedral angles
Bond angles
is the only dihedral angle
Potential Energy Function
How do we know when a predicted structure is the
native shape of the protein ?
In thermodynamics,
A molecule is most stable when it’s free energy is at a minimum
native shape is at a free energy minimum
• The potential energy function is a simplification of actual
forces acting on a real protein molecule and it’s formulation is
based on the given simplified structural model
Example of Potential Energy
Function
Purpose : Minimize
Etotal = EHH + Evdw
Hydrophobic Interaction
Evdw = Cv ·
Summation over all
atoms with rij < 8A
A = Angstrom = 1 tenbillionth of meter
Van der Waals Interaction
fvdw
Van der Waals Potential
rij
Ri Rj
rij = distance between atom i and j
Ri = van der waals radii of atom i
Different approaches to PSP
Ab Initio Methods
Knowledge Based Methods
Ab Initio Methods
What is Ab Initio ?
• Ab Initio means from 1st principles
• Use thermodynamic laws to figure out the configuration
of the fold of the given protein protein folding problem
• Global/semi-global minimization of the function
• 1D
2D = secondary structure problem
• 2D
3D = side-chain conformation
Some Ab Initio Methods
Molecular Dynamic Simulation
Using complex energy functions simulate folding of the primary
sequence until it reaches it’s native state (1D->3D)
Genetic Algorithm
Used in refining a given potential function so that it can best predict the
native state of a protein
Simulated Annealing
Branch and Bound Methods (usually used in side-chain
conformation)
Approximation algorithms
Comparative/Homologue Modeling
Threading
Docking
Knowledge Based Methods
Knowledge Based Methods
Using knowledge of currently known protein folds,
predict the shape of the target protein
Assumption is the native fold of the target protein is
similar to a currently known one i.e in the same family
Unable to predict any novel folds, i.e new fold family
Some Knowledge-Based
Methods
Comparative/Homologue Modeling
Threading
Docking
Methodological Framework for
solving PSP
Primary Protein Sequence
Knowledge-base, e.g PDB
Ab Initio Methods
Homologue Modeling
Threading
Predicted 3D
Structure of Protein
Side-Chain Prediction
Find a conformation of the all the side chains along
the given main chain of a protein
Usually done as the 2nd step in predicting the 3D
structure of protein
Also useful in drug design, where drug structures have
to be designed to be easily docked by enzymes for
breaking down
Side-Chain Prediction
The main chain fold has been computed and given as
input
choose positions of all side chains so as to minimize
some potential energy function
Problem if solved Ab Initio is proven to be NP-Complete
(reduce Clique to it)
Central Dogma
The more tightly packed Side-chains are, the more stable
they will be.
Ponder & Richards have shown that there are a fixed set
of rotations (rotamers) side-chains can take.
Most methods now make use of this library of rotamers
(abt 67 different rotations)
Main concern is the search strategy to find the best
conformation
Methods in Side-chain prediction
Simulated Annealing
A* algorithm
Monte Carlo Minimization
Molecular Dynamics Simulation
Dead End Elimination
Genetic Algorithm
Dead End Elimination
Deterministic method to determine the global minimum
energy conformation (GMEC) of set of side-chains.
Continuously eliminate rotamers from consideration in
the GMEC, until only 1 rotamer is left in each side-chain
position (thus giving final conformation).
DEE can be viewed as a mathematical criteria that a
rotamer must fulfill in order not to be eliminated
Dead End Elimination
Potential function is described in terms of pair-wise
interactions of all rotamers at all positions.
Therefore energy function to minimized can be
formulated as
Sum of pairwise interaction energy between
rotamer r at position j and rotamer u at positions j
energy of the given backbone fold
Sum of energy of rotamer r at side chain i
Dead End Elimination
Assuming p side-chains and n rotamers for each sidechain
Time complexity of finding the configuration that minimizes
the energy function takes O(p*np)
Not feasible to use the original formulation
Original DEE
A rotamer ir can be eliminated from consideration if there is
an alternative rotamer it at the same position that satisifies
Energy resulting
from using rotamer r
at position i
Minimum pairwise
interaction energy
between ir and every
other side-chain j
Maximum pairwise
Energy resulting
interaction energy
from using rotamer t between it and every
at position i
other side-chain j
Original DEE
Given some relevant energy landscape, the previous
inequality in fact says the following
Rotamer r at side-chain i can
only be eliminated only if
maximum energy conformation
using rotamer t at side-chain i is
smaller than the minimum
energy conformation of using
rotamer r
Original DEE
A simplistic implementation of Original DEE by simply
translating the inequality to code result in a time complexity
of O(n2*p2)
There is however still a problem if the following happens
Simple Goldstein DEE
A rotamer ir can be eliminated from consideration if there is
an alternative rotamer it at the same position that satisifies
Energy resulting
from using rotamer r
at position i
Energy resulting
from using rotamer t
at position i
Difference in energy of
conformation using ir
and conformation using
it which are at the point
of closest contact
Simple Goldstein DEE
Given some relevant energy landscape, the previous
inequality in fact says the following
Rotamer r at side-chain i can only be
eliminated by both totamer t1 and
rotamer t2 since the difference is +ve
at the points of closest contact.
Meaning for any given conformation
using t1 or t2 will result in a smaller
overall energy than using r
Simple Goldstein DEE
A simplistic implementation of Simple Goldstein DEE by
simply translating the inequality to code result in a time
complexity of O(n3*p2)
There is however still a problem if energy profiles of
rotamer r and every other rotamer intersect.
More powerful criteria will have to be used
General GoldStein DEE, Simple Split DEE and general
Split DEE
The more powerful the criterion the higher it’s time
complexity
Conclusion
Myriad of methods to attempt to solve the protein
prediction problem
Knowledge-based methods have gained a edge over Ab
initio methods
However not much improvement in the prediction power of
modern heuristics, since the 1st experiment by Anfisen 3
decades ago
Either problem is too hard / More discovery awaits the
adventurous researcher