Transcript Document

DISTANCE MATRIX-BASED
APPROACH TO PROTEIN
STRUCTURE PREDICTION
Andrzej Kloczkowski, Robert L. Jernigan, Zhijun
Wu, Guang Song, Lei Yang - Iowa State
University, USA
Andrzej Kolinski, Piotr Pokarowski - Warsaw
University, Poland
Matrices containing structural
information
• Distance matrix (dij)
• Matrix of square distances D = (dij2)
• Contact matrix C = (cij)
cij = 1 if dij > dcutoff
otherwise cij = 0
• Laplacian of C (Kirchhoff matrix)
Lc = diag(Scij) - C
Lc-1 generalized inverse of Lc in
elastic network models defines
covariance between fluctuations
Similarly we can define Laplacian of
D: LD and generalized inverse LD-1
Spectral decomposition of
structural matrices
A = S lk vk vkT
is expressed by eigenvalues and
corresponding eigenvectors of A
Spectral decomposition of a
square distance matrix
Spectral decomposition of a square distance
matrix is a complete and simple description of
a system of points. It has at most 5 nonzero,
interpretable terms:
A dominant eigenvector is proportional to r2 the square distance of points to the center of
the mass, and the next three are principal
components of the system of points.
CN – contact number
PECM – principal eigenvector of the contact
matrix
GNM – fluctuations of residues computed
from the Gaussian Network Model (Bahar
et al. 1997)
SVR – Support Vector Regression – variant of
SVM for continuous variables
B-factor – temperature factor from X-ray
crystallography
B-factor correlates with the distance from the
center of mass r2 – Petsko 1980
Correlation between fluctuations of residues
and the inverse of their contact number –
Halle 2002
Approximation of distance
matrices
• A = S lk vk vkT
• We used a nonredundnt database of 680
structures from the ASTRAL database
• r2 itself approximates structures with
DRMS 7.3Å
• r2 combined with first principal component
approximates structures with DRMS 4.0Å
Current work:
Prediction of r2 from the sequence with
SVR
Prediction of the first structural component
from the sequence
Principal Component Analysis of Multiple
HIV-1 Proteases Structures
• 164 X-ray PDB structures and 28 NMR PDB structures
and 10,000 structures (snapshots) from the Molecular
Dynamics simulations were analysed.
• The Principal Component Analysis of these three different
datasets were performed.
• The results were compared with normal modes computed
from the Anisotropic Network Model – an Elastic Network
Model that considers anisotropy of fluctuations of residues
in protein.
The a-carbon trace of the HIV-1 structure
Elastic network models
 Rubber elasticity




(polymers - Flory)
Intrinsic motions of structures
(Tirion 1996)
Simple elastic networks of uniform material
Appropriate for largest, most important domain motions
of proteins - independent of many structure details
High resolution structures not needed to learn about
important motions
Rubbery Bodies with Well Defined, Highly Controlled Motions
Elastic Network Models
Calculating Protein Position Fluctuations
Vtot(t) = (g/2) tr [DR(t)T G DR(t)]
<DRi . DRj> = (1/ZN) ∫ (DRi . DRj) exp {-Vtot/kT} d{DR}
= (3kT/g) [G-1]ij
G = Kirchhoff matrix of contacts
G=
=
Compute Normal Modes for Fluctuations and Correlations
HIV Reverse Transcriptase
– Slowest Motion
Push-pull Hinge
Modes of Motion – HIV Protease
Mode 1
Mode 2
Three Ways to Open the Flaps
Mode 3
NMR Structures Fit Elastic Networks
Better than X-Ray Structures
HIV Protease
Overlaps between directions of motions
(dot products of vectors)
Includes Many Drug Bound Structures
Distortions for Drug Binding Are Intrinsic to Protein Structure
Results for 164 X-ray and 28 NMR HIV Protease Structures
Cumulative Overlaps with NMR
Motions
NMR Agreement Better than X-ray
Structural Refinement Using Distribution of
Distances
• We have developed a method of refining NMR structures
using derived distance constraints and mean-force
potentials.
• The original NMR experimental constraints for the
structures were downloaded from BioMagResBank.
• The structures were refined using the default dynamic
simulated annealing protocol implemented in CNS
software (Brunger et al. Yale Univ).
• We used also mean-force potentials E = kT ln P(r) by
adding them into the energy function of the NMR
modeling software CNS. The structures have been
improved significantly (in terms of RMSD, their energy,
NOEs, etc.) after refinement with the database-derived
mean-force potentials.
CASPR 2006
• We have successfully used this method in CASPR 2006 structure
refinement experiment.
• Figure below shows application of our method for a model of 1WHZ
(70 residues) – a refinement from 2.19 Å to 1.80 Å has been obtained.
Distance Intervals
The distances are given with their
possible ranges.
i
j
find all x j such that
li, j  || x i  x j ||  u i, j , (i, j)  S
NP-hard!
A Generalized Distance Geometry Problem
i
Dri
Root mean square fluctuations
B-factors
di,j
n
max
j
rj
x,r
 Dr
j=1
3
j
subject to
||x i  x j ||  Dri  Drj  u i,j
||x i  x j ||  Dri  Drj  li,j , (i,j)  S
Protein 1AX8
Data generation:
Original:
fi : the rms fluctuation of atom i.
S = {(i,j) : di,j = ||yi – yj|| < 5Å}
li,j = di,j – fi – fj
ui,j = di,j + fi + fj
Problem solved:
ri : the fluctuation radius of atom i.
Computed:
maxx, r ∑D ri3
di,j = ||xi – xj||
li,j ≤ di,j – Dri – Drj
ui,j ≥ di,j + Dri +D rj,
(i,j) in S
RMSD (x, y) = 3.6 e -07
1017 atoms
Atomic Fluctuations
0.25
Original
fi
0.2
0.15
Dri
0.1
0.05
Computed
0
0
200
400
600
800
1000
1200
Acknowledgments:
• NIH support:
• 1R01GM081680-01 (AKlo)
• 1R01GM073095-01A2 (RLJ)
1R01GM072014-01 (RLJ)