Transcript 03_Krivov

Development of SCWRL4 for improved prediction
of protein side-chain conformations
Georgii Krivov,
Maxim Shapovalov and Roland L. Dunbrack Jr.
In collaboration with
Moscow Engineering & Physics Institute
© George Krivov
Version 3 was written
®
SCWRL program
by Adrian A. Canutescu and
Dr. Roland L. Dunbrack Jr.
SCWRL
ETotal  min
Main assumption:
 There is a finite set of possible conformations
(called rotamers) for each amino acid residue.
pair-wise
interactions
backbone
SCWRL’s sidechains packing algorithm
1. Obtain spatial backbone structure and
aminoacid sequence
PDB
2. For each residue build possible side-chain
conformations (rotamers) using rotamer library
3. Build interaction graph
– each vertex denotes a certain residue
– an edge between vertices indicates that
there is an interaction between some
rotamers of the corresponding residues
Res 1
Res 2
4. Find optimal assignment of side-chain
conformations by graph decomposition and
dynamic programming
5. Save resolved structure into file
Res 3
Res 4
PDB
Res 1
Inproved dynamic programming
Res 2
Res 3
Res 4
a
 based on a tree-decomposition of the interaction graph
A tree-decomposition of a graph G  V , E  is a pair
 T , X  , where
b
c
d
T   I , F  – is a tree with a set of vertices I and a set of edges F
X   X i : X i  V iI
e
– is a family of subsets of the set V,
such that
iI
i
f
associated with the vertices of T,
Xi  V
g
g e
which satisfies the conditions:
l
e d
 (u, v)  E i  I : u, v  X i
k
egh
d c
 v  V a set of vertices
egf
ehi
 i  I : v  Xi 
is connected in T
h
cdb
bca
hk
hl
Hans L. Bodlaender. 1992
Described combinatorial algorithm…
is capable of larger and denser
graphs than one based on
biconnected components
SCWRL4 is capable of significantly
larger proteins than SCWRL3
typically finishes pretty quickly
no coffee-breaks…
resolves global optimum
(avoids stochastics)
accuracy of prediction entirely depends
on rotamer library and energy potentials
However… a better accuracy is desired
Involve more rotamers
 more interaction to evaluate
 more combinations to enumerate
More realistic potentials
 longer interaction range
 more edges in the graph
 less decomposable
combinatorial
complexity
blows-up
quick collision
detection algorithm
hardly feasible
even with new
algorithm
thermodynamic
fluctuations

via
Flexible Rotamers Model
Hierarchies of bounding boxes
 enable efficient search for intersections
between two groups of geometric figures
Given two groups of figures enclosed into k-dops…
2. Check each combination
for overlapping
1. If overlap then split each
3. Disregard boxes
that don’t clash
4. Continue recursively
on each clashing pair
Involve more rotamers
 more interaction to evaluate
 more combinations to enumerate
More realistic potentials
 longer interaction range
 more edges in the graph
 less decomposable
combinatorial
complexity
blows-up
quick collision
detection algorithm
hardly feasible
even with new
algorithm

James T. Klosowski, et.al. 1998
… works best in conjunction with
k-Discrete Oriented Polytopes
–
–
–
–
easy to enclose a ball
easy to merge
easy clash check
almost rotatable
– a class of convex polytopes with 2k planes
any plane is orthogonal to one of
k basic axes which remain fixed
examples:
k=3
k=2
Cubic (k = 3)
Involve more rotamers
 more interaction to evaluate
 more combinations to enumerate
More realistic potentials
 longer interaction range
 more edges in the graph
 less decomposable
combinatorial
complexity
blows-up
hardly feasible
even with new
algorithm

k=4
Tetrahedral (k = 4)
quick collision
detection algorithm
… works best in conjunction with
k-Discrete Oriented Polytopes
 easy to enclose a ball
easy to merge
easy clash check
almost rotatable
– a class of convex polytopes with 2k planes
any plane is orthogonal to one of
k basic axes which remain fixed
examples:
k=3
k=2
i = 1..k
k=4
simple projection
onto all basic axes
Cubic (k = 3)
Involve more rotamers
 more interaction to evaluate
 more combinations to enumerate
More realistic potentials
 longer interaction range
 more edges in the graph
 less decomposable
combinatorial
complexity
blows-up
hardly feasible
even with new
algorithm

Tetrahedral (k = 4)
quick collision
detection algorithm
… works best in conjunction with
k-Discrete Oriented Polytopes
easy to enclose a ball
 easy to merge
easy clash check
almost rotatable
– a class of convex polytopes with 2k planes
any plane is orthogonal to one of
k basic axes which remain fixed
examples:
k=3
k=2
k=4
max i  max  max i 
min i  min  min i 
Involve more rotamers
 more interaction to evaluate
 more combinations to enumerate
More realistic potentials
 longer interaction range
 more edges in the graph
 less decomposable
Cubic (k = 3)
combinatorial
complexity
blows-up
hardly feasible
even with new
algorithm

Tetrahedral (k = 4)
quick collision
detection algorithm
… works best in conjunction with
k-Discrete Oriented Polytopes
easy to enclose a ball
easy to merge
 easy clash check
almost rotatable
– a class of convex polytopes with 2k planes
any plane is orthogonal to one of
k basic axes which remain fixed
examples:
k-DOP A
k-DOP B
k=3
k=2
k=4
?
A doesn’t clash B
if exists axis xi (1≤i ≤ k) such that
minia  maxbi or maxia  minbi
Involve more rotamers
 more interaction to evaluate
 more combinations to enumerate
More realistic potentials
 longer interaction range
 more edges in the graph
 less decomposable
Cubic (k = 3)
combinatorial
complexity
blows-up
hardly feasible
even with new
algorithm

Tetrahedral (k = 4)
quick collision
detection algorithm
Fast anisotropic hydrogen bond potential
e1
O

n

e2
cd  OH  doptimal cO   n, e1   cos max
w  z0 
1  c   c
2
d
H  cO
E O, H   (1  w)  EDefault  w  E0
More realistic potentials
 longer interaction range
 more edges in the graph
 less decomposable
1
z0
H
e0
cH    n, e0   cosmax
  d  1  cos  max   1  cos  max 
E0  B  qH  qO
doptimal  2.1  d  0.65 B  30
cos  max  45
cos  max  35
For more relevant comparison
Amount of sidechains
it make sense to predict a crystal not the ASU
relative surface accessibility (%)
Knowledge of crystal symmetry enables higher accuracy …
Extra percent in average accuracy
due to crystal awareness
14
12
All residues
Crystal contacts
10
8
6
4
2
0
ALL ARG ASN ASP CYS GLN GLU
HIS
ILE
LEU LYS MET PHE PRO SER THR TRP TYR VAL
Tuning parameters of the Flexible Rotamer Model
due to backbone and frame
si 

sample around rotamer
Estatic
Epairwise
library’s conformation
 pij 
due to sidechains’ interaction
Eself  Estatic  T   k  log  probability 
k , T , ci
i   i , i  ci   i 
may be setup
independently
for each type
of amino acid
n
Estatic
1
 T log 
n i1

1
 T log
e

mn i1 j 1
n
Epairwise
search for optimal
values in highdimensional space
from rotamer library
m
s
 i
e T
pij  si  sj
T
  s1  s2 
optimize one amino acid type in a time (and loop for all)
Optimizing expensive function in multidimensional space
arg max f  x  x 
|| x ||  R
n
, f  x   expensive
1. Generate sample of arguments and
evaluate function at these points
2. Assume that second order
approximation works well
3. From the linear regression resolve
coefficients and their covariance
4. Maximization of quadratic form is
relatively simple, provided that we can
resolve eigenvalues and eigenvectors
5. Hence, generate sample of quadratic forms,
maximize each of them and aggregate
robust for
non-convex
functions!
Traces through the optimization of the FRM parameters
Training on 40 proteins
( ~ 2 500 residues )
+ 24 proteins more and continue ( ~ 5 000 residues )
Average conditional accuracy
89.4
88.9
Testing @ H-bonds 1
Testing @ H-bonds 2
88.4
Training @ H-bonds 1
87.9
87.4
Testing on 130 proteins ( ~ 20 000 residues )
86.9
0
10
20
30
40
Iteration number
50
60
70
80
Conditional accuracy (%)
Measurement: Side chains with better electron density are easier to predict
ARG
ASN
GLU
HIS
MET
PHE
ASP
CYS
ILE
LEU
GLN
LYS
PRO
SER
THR
sliding frame - 20%
1
  40
 2 | 1
TRP
TYR
VAL
 3 |  2 & 1
 4 |  3 &  2 & 1
Confidence of side-chain placement (derived from experimental EDS maps)
ALL
Shapovalov
et.al. 2007
all this good with
Improved usability
functionality of SCWRL4 is available as library
(coming soon)
backbone
PDB file
rotamer
library
SCWRL3.exe
class SCWRL
{
…
};
output PDB file
 enables direct manipulation
of the model via C++ API
SCWRL4.DLL
Questions, Comments, Suggestions ?
Acknowledgements
Dr. Roland Dunbrack
Prof. Nickolai Kudryashov
Colleagues:
Adrian Canutescu
Guoli Wang
Maxim Shapovalov
Qiang Wang
Qifang Xu
Thanks for Your Attention!
Have a Nice Day and welcome to our poster!