Survey of Protein Structure Prediction Methods

Download Report

Transcript Survey of Protein Structure Prediction Methods

Protein Structure
Prediction
Historical Perspective
Protein Folding: From the Levinthal
Paradox to Structure Prediction, Barry
Honig, 1999
 A personal perspective on advances and
developments in protein folding over the
last 40 years

Levinthal Paradox
Cyrus Levinthal, Columbia University,
1968
 Observed that there is insufficient time to
randomly search the entire conformational
space of a protein
 Resolution: Proteins have to fold through
some directed process
 Goal is to understand the dynamics of this
process

Old vs. New Views

Old:
 Heirarchical view of protein
 Secondary structures form,
folding
then interact to form
tertiary structures
 General order of events

New:
 Statistical ensembles of states
 Potential energy landscape
 Folding “Funnel”

Not all that different; most important ideas were
theorized many years ago
Secondary Structures


Consensus view is that secondary structure
formation is the earliest part of the folding
process
Numerous studies indicate that local sequence
codes for local structures
 Helical
sequences in a folded protein tend to be
helical in isolation

Current SSE prediction algorithms about 70%
correct (1993). Failure indicates some tertiary
interactions in stabilizing SSEs
However…
Not clear what sequence elements code
for overall topology
 One factor is the existence of hydrophobic
faces on the surface of SSEs
 Still challenges in predicting topology of
SSEs, even when protein class is known

Atomic level calculations
Molecular calculations have made great
impact in our understanding of protein
folding
 Harold Scheraga, 1968
 Shneior Lifson, 1969
 Martin Karplus’s laboratory, ~1979
 Early calculations had trouble dealing with
solvent effects

Secondary Structure




Many of the essential elements of protein
energetics can be derived from looking at SSE
formation
Early experimental work: Ingwall et all, 1968
Baldwin et all, 1989, Worked on stabilizing
shorter helices
Dyson, Wright, 1991, demonstrated that even
short peptides in solution can be partially
structured
Results
Yang and Honig, 1995
 Alpha-helices stabilized by hydrophobic
interactions and close packing; hydrogen
bonding has little effect
 Beta-sheets stabilized by non-polar
interactions between residues on adjacent
strands
 Work supports idea that SSEs coded for
locally in the sequence

Folding Pathways




SSEs can change conformation in the presence
of a relatively small number of tertiary
interactions
Free-energy difference between alpha-helix,
beta-sheet, and coil is not great
Individual helices can be changed into betasheets by changing just a few amino acids
This suggests that proteins have a “structural
plasticity” which allows for changes in
conformation
Folding Pathways




Early in folding processes, many different
combinations of SSEs have very similar
stabilities
In the end, it is the tertiary interactions which
drive towards the native topology
Early in folding, “flickering” of SSEs, eventually
stabilized by tertiary interactions and converge
to native state
Suggests that multiple folding pathways exist,
which can all lead to the same end result once
stabilized
Structure Prediction

Recently, a split has been seen
 Protein prediction problem
 Trying to predict the end result of folding, using a large
amount of comparison between known and unknown
structures
 Protein folding problem
 Trying to understand the folding path which leads to the end
result of folding, typically by MD simulations or energy
calculation

Authors contention that both areas will need to
be used together to fully understand protein
folding
PrISM


Yang and Honig, 1999
Software suite which integrates prediction based
on simulations and known information about
structures
 Sequence analysis
 Structure based sequence alignment
 Fast structure-structure superposition
using a
structural domain database
 Multiple Structure alignment
 Fold recognition and homology model building

Used to make predictions for all 43 targets of
CASP3 conference (more on CASP later)
Conclusions
Much of the current understanding of
protein folding was theorized long ago
 Vague and speculative ideas have been
replaced by carefully defined theoretical
concepts and rigorous experimental
observations

Conclusions
Polypeptide backbone is the most
important determinant of structure
 SSEs are “meta-stable”; statement that
sequence determines structure not wholly
accurate
 More accurate statement is that sequence
chooses from a limited set of available
SSEs and determines how they are
ordered in space

Conclusions

Free-energy differences between alternate
conformations is not large: may provide a
bases for rapid evolutionary change
CASP
A decade of CASP: progress, bottlenecks
and prognosis in protein structure
prediction, John Moult
 CASP = Critical Assessment of Structure
Prediction
 First held in 1994, every 2 years
afterwards
 Teams make structure predictions from
sequences alone

CASP

Two categories of predictors
 Automated
Automatic Servers, must complete analysis within
48 hours
 Shows what is possible through computer analysis
alone

 Non-automated
Groups spend considerable time and effort on
each target
 Utilize computer techniques and human analysis
techniques

CASP

CASP6, 1994
 200
prediction teams from 24 countries
 Over 30,000 predictions for 64 protein targets
collected and evaluated
 Conference held after to discuss results, with
many teams presenting individual results and
methodologies
 Helps to steer future work
Modeling classes
Comparative modeling based on a clear
sequence relationship
 Modeling based on more distant
evolutionary relationships
 Modeling based on non-homologous fold
relationships
 Template free modeling

Comparative modeling based on a
clear sequence relationship


Easily detectable sequence relationship
between the target protein and one or more
known protein structures, typically through
BLAST
Copy from template, however:
 Must
align target and template sequences
 In general, reliably building regions not present in the
template is still a challenge
 Sidechain accuracy is poor

Refinement remains a challenge
Comparative modeling based on a
clear sequence relationship


Progress in MD
needed for refinement
Models useful for
identifying which
members of a protein
family have similar
functionalities, and
which are different
Modeling based on more distant
evolutionary relationships




Makes use of PSI-BLAST and hidden Markov
models
Compile a profile for the sequence, compare this
profile to other known profiles
Allows for prediction of structures, even when
sequence is not close
Use of metaservers to find consensus structures
between CASP4 and CASP5 has led to
improved accuracy
Modeling based on more distant
evolutionary relationships

Limitations:
 Correct template may not be identified
 Alignment of target sequence to template is not trivial
 Significant fraction of residues will have no structural
equivalent in the template; modeling of these regions
is hit or miss
 Although regions are similar, they are not identical,
and the greater the difference, the higher the error


Details are thus not accurate, but overall
structure can be useful
For improvements, must work together with
template-free methodologies
Modeling based on more distant
evolutionary relationships
Modeling based on nonhomologous fold relationships
Protein “threading”
 In recent CASP experiments, these
methods have not been competitive with
template free models

Template-free Modeling



For sequences where no template is available
Historically physics based approaches were
used
Newer methods focus on substructures
 While
we have not seen all folds, we have probably
seen nearly all substructures

Make use of substructure relationships
 From
a few residues through SSEs to supersecondary structures
Template-free Modeling





Range of possible conformations and
considered
Most successful package has been ROSETTA
For proteins less than ~100 residues, produce
one or several approximately correct structures
(4-6 A rmsd for C-alpha atoms)
Selecting the most accurate structures from all
possibilities is still to be solved, typically make
use of clustering currently
Development of atomic models is crucial to
further progress
Template-free Modeling
CASP Progress