Survey of Protein Structure Prediction Methods
Download
Report
Transcript Survey of Protein Structure Prediction Methods
Protein Structure
Prediction
Historical Perspective
Protein Folding: From the Levinthal
Paradox to Structure Prediction, Barry
Honig, 1999
A personal perspective on advances and
developments in protein folding over the
last 40 years
Levinthal Paradox
Cyrus Levinthal, Columbia University,
1968
Observed that there is insufficient time to
randomly search the entire conformational
space of a protein
Resolution: Proteins have to fold through
some directed process
Goal is to understand the dynamics of this
process
Old vs. New Views
Old:
Heirarchical view of protein
Secondary structures form,
folding
then interact to form
tertiary structures
General order of events
New:
Statistical ensembles of states
Potential energy landscape
Folding “Funnel”
Not all that different; most important ideas were
theorized many years ago
Secondary Structures
Consensus view is that secondary structure
formation is the earliest part of the folding
process
Numerous studies indicate that local sequence
codes for local structures
Helical
sequences in a folded protein tend to be
helical in isolation
Current SSE prediction algorithms about 70%
correct (1993). Failure indicates some tertiary
interactions in stabilizing SSEs
However…
Not clear what sequence elements code
for overall topology
One factor is the existence of hydrophobic
faces on the surface of SSEs
Still challenges in predicting topology of
SSEs, even when protein class is known
Atomic level calculations
Molecular calculations have made great
impact in our understanding of protein
folding
Harold Scheraga, 1968
Shneior Lifson, 1969
Martin Karplus’s laboratory, ~1979
Early calculations had trouble dealing with
solvent effects
Secondary Structure
Many of the essential elements of protein
energetics can be derived from looking at SSE
formation
Early experimental work: Ingwall et all, 1968
Baldwin et all, 1989, Worked on stabilizing
shorter helices
Dyson, Wright, 1991, demonstrated that even
short peptides in solution can be partially
structured
Results
Yang and Honig, 1995
Alpha-helices stabilized by hydrophobic
interactions and close packing; hydrogen
bonding has little effect
Beta-sheets stabilized by non-polar
interactions between residues on adjacent
strands
Work supports idea that SSEs coded for
locally in the sequence
Folding Pathways
SSEs can change conformation in the presence
of a relatively small number of tertiary
interactions
Free-energy difference between alpha-helix,
beta-sheet, and coil is not great
Individual helices can be changed into betasheets by changing just a few amino acids
This suggests that proteins have a “structural
plasticity” which allows for changes in
conformation
Folding Pathways
Early in folding processes, many different
combinations of SSEs have very similar
stabilities
In the end, it is the tertiary interactions which
drive towards the native topology
Early in folding, “flickering” of SSEs, eventually
stabilized by tertiary interactions and converge
to native state
Suggests that multiple folding pathways exist,
which can all lead to the same end result once
stabilized
Structure Prediction
Recently, a split has been seen
Protein prediction problem
Trying to predict the end result of folding, using a large
amount of comparison between known and unknown
structures
Protein folding problem
Trying to understand the folding path which leads to the end
result of folding, typically by MD simulations or energy
calculation
Authors contention that both areas will need to
be used together to fully understand protein
folding
PrISM
Yang and Honig, 1999
Software suite which integrates prediction based
on simulations and known information about
structures
Sequence analysis
Structure based sequence alignment
Fast structure-structure superposition
using a
structural domain database
Multiple Structure alignment
Fold recognition and homology model building
Used to make predictions for all 43 targets of
CASP3 conference (more on CASP later)
Conclusions
Much of the current understanding of
protein folding was theorized long ago
Vague and speculative ideas have been
replaced by carefully defined theoretical
concepts and rigorous experimental
observations
Conclusions
Polypeptide backbone is the most
important determinant of structure
SSEs are “meta-stable”; statement that
sequence determines structure not wholly
accurate
More accurate statement is that sequence
chooses from a limited set of available
SSEs and determines how they are
ordered in space
Conclusions
Free-energy differences between alternate
conformations is not large: may provide a
bases for rapid evolutionary change
CASP
A decade of CASP: progress, bottlenecks
and prognosis in protein structure
prediction, John Moult
CASP = Critical Assessment of Structure
Prediction
First held in 1994, every 2 years
afterwards
Teams make structure predictions from
sequences alone
CASP
Two categories of predictors
Automated
Automatic Servers, must complete analysis within
48 hours
Shows what is possible through computer analysis
alone
Non-automated
Groups spend considerable time and effort on
each target
Utilize computer techniques and human analysis
techniques
CASP
CASP6, 1994
200
prediction teams from 24 countries
Over 30,000 predictions for 64 protein targets
collected and evaluated
Conference held after to discuss results, with
many teams presenting individual results and
methodologies
Helps to steer future work
Modeling classes
Comparative modeling based on a clear
sequence relationship
Modeling based on more distant
evolutionary relationships
Modeling based on non-homologous fold
relationships
Template free modeling
Comparative modeling based on a
clear sequence relationship
Easily detectable sequence relationship
between the target protein and one or more
known protein structures, typically through
BLAST
Copy from template, however:
Must
align target and template sequences
In general, reliably building regions not present in the
template is still a challenge
Sidechain accuracy is poor
Refinement remains a challenge
Comparative modeling based on a
clear sequence relationship
Progress in MD
needed for refinement
Models useful for
identifying which
members of a protein
family have similar
functionalities, and
which are different
Modeling based on more distant
evolutionary relationships
Makes use of PSI-BLAST and hidden Markov
models
Compile a profile for the sequence, compare this
profile to other known profiles
Allows for prediction of structures, even when
sequence is not close
Use of metaservers to find consensus structures
between CASP4 and CASP5 has led to
improved accuracy
Modeling based on more distant
evolutionary relationships
Limitations:
Correct template may not be identified
Alignment of target sequence to template is not trivial
Significant fraction of residues will have no structural
equivalent in the template; modeling of these regions
is hit or miss
Although regions are similar, they are not identical,
and the greater the difference, the higher the error
Details are thus not accurate, but overall
structure can be useful
For improvements, must work together with
template-free methodologies
Modeling based on more distant
evolutionary relationships
Modeling based on nonhomologous fold relationships
Protein “threading”
In recent CASP experiments, these
methods have not been competitive with
template free models
Template-free Modeling
For sequences where no template is available
Historically physics based approaches were
used
Newer methods focus on substructures
While
we have not seen all folds, we have probably
seen nearly all substructures
Make use of substructure relationships
From
a few residues through SSEs to supersecondary structures
Template-free Modeling
Range of possible conformations and
considered
Most successful package has been ROSETTA
For proteins less than ~100 residues, produce
one or several approximately correct structures
(4-6 A rmsd for C-alpha atoms)
Selecting the most accurate structures from all
possibilities is still to be solved, typically make
use of clustering currently
Development of atomic models is crucial to
further progress
Template-free Modeling
CASP Progress