Transcript Document
Construyendo modelos 3D de proteinas
‘fold recognition / threading’
Why make a structural model for your protein ?
The structure can provide clues to the function through
structural similarity with other proteins
With a structure it is easier to guess the location of active sites
With a structure we can plan more precise experiments in the lab
We can apply docking algorithms to the structures
(both with other proteins and with small molecules)
Protein Modeling Methods
• Ab initio methods:
solution of a protein folding problem
search in conformational space
• Energy-based methods:
energy minimization
molecular simulation
• Knowledge-based methods:
homology modeling
fold recognition / threading
Why do we need Ab Initio Methods?
data taken from PDB
http://www.rcsb.org/pdb/holdings.html
New folds and those sequences with very little sequence
homology <15%
Protein Modeling Methods
• Ab initio methods:
solution of a protein folding problem
search in conformational space
• Energy-based methods:
energy minimization
molecular simulation
• Knowledge-based methods:
homology modeling
fold recogniion
Predicting Protein Structure:
Threading / Fold Recognition
Basis
*
It is estimated there are only around 1000 to
10 000 stable folds in nature
*
Fold recognition is essentially finding the best
fit of a sequence to a set of candidate folds
*
Select the best sequence-fold alignment using a
fitness scoring function
The Threading Problem
• Find the best way to “mount” the residue
sequence of one protein on a known
structure taken from another protein
Why is it called threading ?
• threading a specific sequence through all
known folds
• for each fold estimate the probability that
the sequence can have that fold
Threading: Basic Strategy
Query
dhgakdflsdfjaslfkjsdlfjsdfjasd
Library
of folds
Scoring & selection
Spatial
Interactions
Template
Sequence
Protein Threading
• Conserved Core Segments
K
L
Protein A
J
I
Conserved
Core
Segments
Protein B
Two structurally
similar proteins
Spatial adjacencies
(interactions)
Possible threading
with a sequence
Input/Output of Protein Threading
Core segments
C[1..m]
Amino acid
sequence a[1..n]
Pairwise
amino acid
scoring
function
g(…)
T
H
R
E
A
D
I
N
G
Fold recognition (Threading)
The sequence:
MA A G Y AV L S
+
Known protein folds
structural model
Input:
sequence
H bond donor
H bond acceptor
Glycin
Hydrophobic
Library of folds of known proteins
H bond donor
H bond acceptor
Glycin
Hydrophobic
S=-2
Z= -1
S=5
Z=1.5
S=20
Z=5
Amino acid type
Position on sequence
A
1
N
D
…
10 -50 101
2 -24
:
C
:
87 -99
:
:
:
Y Gop Gext
-80 100
10
167 100
10
:
:
:
100
10
Fold recognition/ Threading
Disadvantages:
•
threading methods seldom lead to the alignment
quality that is needed for homology modeling.
•
less than 30% of the predicted first hits are true
remote homologues (PredictProtein).
Threading resources
• TOPITS
Heuristic Threader, part of larger structure
prediction system
• 3DPSSM
Integrated system, does its own MSA and
secondary structure predictions and then
threading
• GenThreader
Similar to 3DPSSM
Side chain construction
In homology modelling, construction of the side chains is
done using the template structures when there is high
similarity between the built protein and the templates
Without such similarity the construction can be done using
rotamer libraries
A compromise between the probability of the rotamer and its
fitness in specific position determines the score. Comparing
the scores of all the rotamer for a given amino acid determines
the preferred rotamer.
In spite of the huge size of the problem (because each side
chain influences its neighbours) there are quite succesful
algorithms to this problem.
In this work we examined differences in structures of
amino- acid side chains around point mutations.
Conformation - a given set
of dihedral angle which
defines a structure.
Asn
Rotamer - energetically
favourable conformation.
Phe
Ab initio
The sequence
MA A G Y AV L S
structural model
Ab initio methods for modelling
This field is of great theoretical interest but, so far, of very
little practical applications. Here there is no use of sequence
alignments and no direct use of known structures
The basic idea is to build empirical function that simulates
real physical forces and potentials of chemical contacts
If we will have perfect function and we will be able to scan all
the possible conformations, then we will be able to detect the
correct fold
Predicting Protein Structure:
Ab Initio Methods
Sequence
Prediction
Secondary
structure
Tertiary
structure Energy
Low energy Validation
structures
Mean field
Minimization
potentials
Predicted
structure
Ab initio Methods
Simplified models
simplified alphabet (HP)
simplified representation (lattice)
Build-up techniques
Deterministic methods
quantum mechanics
diffusion equations
Stochastic searches
Monte Carlo
genetic algorithms
Rosetta approach
• Rosetta (David Baker) consistently outstanding performer
in last two CASPs
• Integrated method
– I-Sites: much finer grained substructures than
secondary structures. A library of all structures each
AA 9mer is found in (taken from PDB)
– Heuristic global energy function to estimate quality of
folds
– Monte Carlo search through assignments of I-Sites to
minimize energy function.
• Also, HMMSTR, HMM-driven method for
assigning I-Sites.
Rosetta prediction method
• Define global scoring function that estimates probability of
a structure given a sequence
• Generate version of I-sites with fixed length subsequences
(9 amino acids)
– Calculate P(I-Site|sequence) for all sequences and Isites
• Generate structures by Monte Carlo sampling of
assignments of fixed size I-sites to subsequences
• End up with ensemble of plausible structures
Rosetta is way ahead
• CASP 4 results.
• CASP 5 similar, but not as dramatic.
Fully automated predictions
• CAFASP-2
• Meta-servers work best
– Integrate predictions from several other servers
– Significantly better predictions than any
individual approach
• Several public metaservers available:
– http://bioinfo.pl/Meta/ is best all-around