Transcript Document

Computer simulations of proteins: all-atom
and coarse-grained models
Valeri Barsegov
Department of Chemistry
University of Massachusetts Lowell
YITP, Kyoto University, Japan (2008)
Outline:
I.
Introduction:
•
single molecule spectroscopy of protein unfolding: biological relevance;
pulling experiments (AFM, laser/optical tweezers, force protocols)
single molecule spectroscopy of unbinding: biological relevance;
experimental probes; resolution of forces, lifetimes, and extension
•
II. Molecular simulations of proteins:
•
•
•
proteins: structure, fold types, examples
all-atom Molecular Dynamics (MD) simulations: force fields, examples,
simulations of IR spectra
coarse-grained description of proteins: approximations, examples
III. New direction - computer simulations using graphics cards:
•
•
basic facts, computer architecture, algorithms
applications
I.1 Single-molecule dynamic force spectroscopy of forced unfolding of
proteins: biological relevance
Fact 1:
“mechanically active” proteins perform their biological function in linear tandems of
“head-to-tail” (C-terminal-to-N-terminal) connected protein domains
Examples:
•
Titin contains tandems of immunoglobulin (Ig) domains, separated by short linkers
sequences (muscle function)
•
Actin-crosslinking filamins contain rod-like tandem of ddFLN domains (cellular
locomotion)
•
Fibronectin tandems consist of nonidentical Fn domains (extracellular matrix, cell
elasticity)
•
Ubiquitin is a multimeric protein (Ub)n of n=9 identical Ub repeats (protein degradation,
signaling pathways)
I.2 Single-molecule dynamic force spectroscopy of forced
unfolding of proteins: AFM experiment
fS
0
fS
force-clamp mode
force-ramp mode
f S (t )  rf t
f S  f 0  const
t
J. Brujic, R. Hermans, K. Walther & J.
Fernandez, Nature Phys., 2, 282 (2006); J.
Fernandez & H. Li, Science, 303, 1674 (2004)
0
t
M. Rief, M. Gautel, F. Oesterhelt, J. Fernandez
& H. Gaub, Science, 276, 1109 (1997); R.
Zinober, D. Brockwell, G. Beddard, A. Blake, P.
Olmsted, S. Radford & D. Smith, Protein Sci.,
11, 2759 (2002)
I.3 Single-molecule dynamic force spectroscopy of forced
unbinding of proteins: biological relevance
I.4 Single-molecule dynamic force spectroscopy of forced
unbinding of proteins: leukocyte rolling on endothelium
J.-G. Geng, M. Chen, K.-C. Chou, Curr Med Chem, 11,
2153 (2004); L. M. Coussens, Z. Werb, Nature, 420, 860
(2002); Y. J. Kim, L. Borgis, N. M. Varki, A. Varki,
Proc. Natl. Acad. Sci. USA, 95, 9325 (1998); J. Weisel, H.
Shuman, R. Litvinov, Curr Opin Struct Biol, 13, 227
(2003)
I.5 Single-molecule dynamic force spectroscopy of forced
unfolding of proteins: pulling force AFM experiment
f-constant
t, s
f(t)=rf t
f, pN
J. Weisel, H. Shuman, R. Litvinov, Curr Opin Struct Biol, 13, 227 (2003); M. Schlierf, H. Li, J. Fernandez, PNAS,
101, 7299 (2004); J. Liphardt, D. Smith, C. Bustamante, Curr Opin Struct Biol, 19, 279 (2000); J.-F. Allemand, D.
Bensimon, V. Croquette, ibid, 13, 266 (2003); S. Weiss, Science, 283, 1676 (1999); E. Evans, PNAS, 98, 3784 (2001)
I.6 Single-molecule dynamic force spectroscopy of proteins: experimental
resolution of unfolding forces, times, and distances
Experimental resolution:
•protein extension X ~1 nm;
•stretching force fS  100pN
 5-10pN
•relaxation interval T  10-100s
•force-quench fQ
J. Fernandez & H. Li, Science, 303, 1674 (2004);
I. Schwaiger, M. Schleicher, A. Noegel & M. Rief,
EMBO Reports, 6, 46 (2005); J. Brujic, R. Hermans,
K. Walther & J. Fernandez, Nature Phys., 2, 282
(2006)
II.1 Molecular simulations of proteins: levels of structure of proteins
o Amino acids in proteins (or polypeptides) are joined together by peptide bonds.
o The sequence of R-groups along the chain is called the primary structure.
o Secondary structure refers to the local folding of the polypeptide chain.
o Tertiary structure is the arrangement of secondary structure elements in 3D
o Quaternary structure describes the arrangement of a protein's subunits.
The PDB is the single worldwide
repository of 3D structure data of
proteins and nucleic acids: ~35,000
structures as of August 2005.
(www.rcsb.org/pdb)
Other Web Resources:
1.
NCBI
2.
The European Bioinformatics
Institute (EBI) (www.ebi.ac.uk)
3.
The RNA world (www.imbjena.de/RNA.html)
II.2 Molecular simulations of proteins: secondary and tertiary
structure of proteins
Φ = -57o , Ψ = -47o
right handed alpha-helix
Chain has directionality!
II.3 Molecular simulations of proteins: secondary and tertiary
structure of proteins
Φ = (-110o, -140o), Ψ = (110o, -135o)=> beta-sheet
II.4 Molecular simulations of proteins: quaternary structure of proteins
Alpha-beta folds
Multi-domain proteins
a) Control protein
b) Immunoglobulin (muscles)
c) Fibronectin
d) Growth factor
Knotted proteins
II.5 All-atom classical Molecular Dynamics (MD) simulations: force fields
I. Potential for bonded interactions:
VB  VBL  VBA  VDIH  VSS
VBL-bondlength potential, VBA-bond-angle
potential, VDIH-dihedral angle potential, VSS
– disulfide bond potential
II. Potential for non-bonded interactions:
VNB  VPP  VWW  VWP
VPP- protein-protein interaction potential, VWW- water-water potential, VWP - water-protein interaction
potential
III. Software (open-source):
IV. Water models:
•GROMACS (force field: OPLS and GROMOS )
•NAMD (force fields: CHARMM22, CHARMM27)
•GROMACS (SPC, SPC/E, SPC-fw)
•NAMD (TIP, TIP3P)
GROMACS (Univ. of Groeningen, Netherlands): ftp://ftp.gromacs.org/pub/
NAMD (Univ. of Illinois at Urbana Shampaign, USA): http://www.ks.uiuc.edu/Research/namd/
II.6 All-atom MD simulations of proteins: examples of fibrinogen and
A-knob-a-hole complex of fibrin
Fibrin polymerisation: ~2,400 a.a., ~48nm
•
essential for blood clotting
•
implicated in heart attack and stroke
II.7 All-atom MD simulations of proteins: IR spectroscopy of proteins
- infrared light (vibrations of bonds)
Amide I
Amide I & Amide II are the major bands:
- conformationally sensitive
- localized at individual a.a site
Amide I :
C=O-stretching (90%)+C-N-stretch (10%)
Amide II:
N-H-bending (60%)+C-N-stretch (40%)
Krimm & Bandekar, Adv. Prot. Chem., 38, 181 (1986); Woutersen & Hamm, J. Phys: Cond. Matt. 14, R1035 (2002);
Venyaminov & Kalnin, Biopolymers, 30, 1243 (1990); Chergadze $ Nevskaya, ibid, 15, 637 (1976)
II.8 All-atom MD simulations of proteins: IR spectroscopy of proteins
1. Vibrational exciton Hamiltonian:
H
 b

i i
i
b 
d
 2b
 
i i i
b b bi 
i
Lb
i j

ij i
bj
 ( , )  frequency (  1); d  anharmonicity
Lij ( , )  electrostatic coupling;[bi b j ]  ij
2. Transition dipole coupling (TDC):


( i rij )(  j rij ) 
 i  j
Lij  C
3  3
5

 r
r
ij
 ij

i  dipole moment unit vector
rij  vector connecting dipoles
C  coupling magnitude
3. Linear absorption spectrum:
2
Cheatum et al, JCP, 120, 8201 (2004);
Torii & Tasumi, JCP, 96, 3379 (1992);
S. Mukamel, Principles of Nonlinear Spectroscopy
e
I ( )   Im 
; e  e  i bi g ;   line broadening
e   E e  i
i

e
 sum over one  quantum eigenstates e
II.9 All-atom MD simulations of proteins: IR spectroscopy of proteins
Assumptions used in the vibrational exciton Hamiltonian:
- dynamics of in the near-equilibrium state
- fast bath relaxation (fixed line broadening, )
- fitting parameters (diagonal energies, peak amplitudes, frequency splitting)
- energies/amplitudes are from ab initio maps of N-methylacetamide, glycine dipeptide analogs
- transferability of ab initio maps to larger proteins
Direct calculation of IR spectra of Amide I from MD:
- Amide I  CO-vibration with | q C |  | q O |
- Correction Factor due to assumptions/harmonic force field
i ,CO  qi ,C ri ,C  qi ,O ri ,O  q (ri ,C  ri ,O )  qri ,CO
i ,CO   x i ,CO n x   y i ,CO n y   z i ,CO nz
M CO (t ) 

i ,CO
i
Advantages of correlation functions:
(t ); i runs over all amide CO' s
- IR obtained directly from classical MD
d
- beyond ensemble average
j  M CO 
M CO  classical current
dt
- far-from-equilibrium regime

2
 
i t
I ( ) 
dt
e
j
(
t
)
j
(
0
)

QCF

3VkTn( ) 
1  exp[   ]
V  volume; n( )  refractive index

II.10 All-atom MD simulations of proteins: IR spectroscopy of proteins
Ubiquitin (1UBQ, 76 a.a):
- water box (4,600 TIP3P, 47Å51Å 57Å)
- 8 trajectories (t=4ps, dt=0.1fs, NVE) at T=300K
- Ewald sum method (long range electrostatics)
- 12Å cutoff for L-J forces
A16-22 (3KLVFFAE, 21 a.a):
- water box (2000 TIP3P, 44Å41Å 36Å)
- - Ewald sum method; 12Å cutoff for L-J forces
- 12 trajectories (t=8ps, dt=0.1fs, NVE) at T=300K
Correction Factor=0.985
(CHARMM22)
Chung et al, PNAS, 102, 612 (2005)
Cheatum et al, JCP, 120, 8201 (2004)
II.11 Coarse-grained (CG) descriptions of proteins: building the CG model
I. Coarse-grained model for P-selectin:
Step 1: creating structure file of Ca &
centers of mass of residues from PDB
structure of P-selectin (www.rcsb.org)
•mimicking hydrogen bonds
•modeling S-S bonds
Step 2: computing potential energy of obtained conformation of P-selectin:
VTOT  VBL  VSBC  VBA  VDIH  VHB  VNON  VSS
Step 3: follow Langevin Dynamics
VTOT , f
d
 Xj  
 G j (t )
dt
X j
VTOT , f  VTOT  f  x
K. Dill et al, Protein Sci, 4, 561 (1995);
D. Thirumalai, D. Klimov, PNAS, 97, 2544 (2000);
J. Bryngelson et al, Protein, 21, 167 (1995);
M. Karplus, A. Sali, Curr Opin Struct Biol, 5, 58 (1995);
Kolinski, J. Skolnick, Polymer, 45, 511 (2004)
II.11 Coarse-grained (CG) descriptions of proteins: force field
I. Scales of energy/length/mass/time:
h
m
- hydrophobic interaction (1.25 kcal/mol);
 22
- residue mass ( 5  10 g );  
a
-bond length (3.8 Å)
ma 2 /  h - the timescale (~3ps )
II. Harmonic connectivity potentials:
VBL 
 
i

2
kr
X i 1  X i  a ; VSBC 
2

i


2
ks
X s,i  X b,i  a ; VBA 
2

i
k
2
i  0  ;VSS 

2

i
k SS Cys
2
ri  r0 Cys 

2
kr  100h / a 2 ; k s  200h / a 2 ; k  20h / (rad ) 2 ;0  1050 ; k SS  k s / 5
III. Dihedral angle potential:
VDIH 
  A (1  cos[
i
i

 i 0 ])  Bi (1  cos[3i  3i 0 ])  Ci (1  sin[i  i 0 ])
i
turn
β-sheet
α-helix
. h , Ci  0.6h
. h , Ci  0 Ai  Bi  12
Bi  0.2h , Ai  Ci  0 Ai  Bi  12
II.11 Coarse-grained (CG) descriptions of proteins: force field
IV. Hydrogen bond potential:
VHB  
h
 exp[ a (cos
3
2
 1,i  cos2  2 ,i )]
i
cos 1,i 
( X OH  X i ,i 1 )
X OH X i ,i 1
;cos 2 ,i 
( X OH  X i  3,i  4 )
;
X OH X i  3,i  4
a2
V. Potential for native contacts:
6
5
  X 0  12
 X ij0 
 X ij0  
ij
  2.3
  15
 
VS  4(1  bij )h  
. 
X ij 
X ij  
  X ij 




bij-contact interaction matrix; X ij0-contact distance (Kolinski et al, JCP, 98, 7420 (1993))
VI. Nonbonded potential:
VS  VB  VBS
VII. Unfolding/unbinding trajectories:

8
 h
3
  a  12 
 
 
  X ij  


VTOT , f  VTOT  f  x
 VTOT , f
d
Xj  
 G j (t );  G j (t )  0,  Gi (t )G j (t ' )  6kTij  (t  t ' )
dt
Xj
II.12 Coarse-grained (CG) descriptions of proteins: forced rupture of the
P-selectin-sPSGL noncovalent bond
f
f (t )  r f t
0
N-terminus
of P-selectin
rf  0.028 N / s, t  05
. s
t
f (t )  r f t
C-terminus
of sPSGL-1
III.1 Computer simulations using graphics cards: basic facts
CPU:
Advantages:
• can perform very sophisticated flow control (IF/THEN – cycles, conditionals, etc.)
• single CPU cores are faster (3.0GHz) or faster
• a lot of well-tested (commercial) software is available
Disadvantages:
• has no more than 6 cores (today)
• parallel programming on CPU is difficult
• data exchange b/w nodes in a cluster occurs through relatively slow network
GPU:
Advantages:
• up to 240 cores (GeForce 280, Tesla C1060)
• easy to write parallel codes with CUDA language (extension of C)
• memory bandwidth is high because all cores are local
Disadvantages:
• single core clock is not as fast as CPU core (0.5GHz)
• can’t be used for applications with sophisticated flow control
• not many software available for GPU (started in ~2006)
III.2 Computer simulations using graphics cards: hardware
GPU:
• highly parallel
• multythreaded
• manycore processor
Historically, GPU
• was designed for compute-intensive, highly parallel computation
• more transistors are devoted to data processing rather than data caching and flow control
• well-suited for problems that involve data-parallel computations, i.e. the same program is
executed on many data elements in parallel (MD, coarse-grained simulations).
III.3 Computer simulations using graphics cards: programming mode
CUDA:
• consist of a minimal extension to the C language
• parallel programming model and software environment
• designed to overcome the challenge of creating software that transparently scales on
manycore processors
Example:
• vecAdd() function is called N times on GPU.
• <<<1, N>>> means that the procedure runs in one 1D block with N threads.
• i = threadIdx.x is a way for thread to identify, which element of the vector it
should work with.
III.4 Computer simulations using graphics cards:
software organization
Thread hierarchy:
1.
thread index is a 3D vector, so that threads can
be identified using a 1D, 2D or 3D index
forming 1D, 2D or 3D thread block
2.
multiple blocks can be organized into 1D or
2D grid. Each block can be identified within
grid using 1D, 2D or 3D block index
3.
all threads in one block are doing the same
thing with different data
4.
threads can synchronize and pass data to each
other within block using shared memory
5.
threads can pass the data to the CPU through
the GPU global memory
III.5 Computer simulations using graphics cards:
software organization
Memory hierarchy:
1.
each thread has it own local memory (in
cache) for storing temporary variables
2.
each block has shared memory (in cache)
for synchronizing the threads within block
3.
device has global memory that can be
accessed from any thread on the GPU
4.
local memory and shared memory are much
faster than global, but they are available
only locally and exists only during the
lifetime of a thread or a block.
5.
global memory is relatively slow, and can
be also accessed from CPU.
III.6 Computer simulations using graphics cards: hardware model
Hardware organization:
1.
each device have N multiprocessors
2.
multiprocessors can share data only through
device (global) memory
3.
A multiprocessor have M processors (ALUs)
4.
number of threads that can run at the same
time is equal to NxM; for GeForce 8800GT,
M=8, N=14 (number of processors = 112!!!)
5.
one block can run only on one multiprocessor
so the number of blocks in program should be
at least equal to the number of multiprocessors on the device.
III.7 Computer simulations using graphics cards: applications
MD and CG simulations are suitable for GPU:
• same potential (force field) for all atoms
(beads)
Example:
• Rouse chain model of homopolymer
• Lennard-Jones potential (self-avoidance)
• integration scheme is explicit
• 1,000,000 time steps for each chain
• systems have huge number of atoms (beads)
• Intel Xeon 2GHz Dual Core (CPU,
~$350) vs GeForce 8800 GT (GPU, ~$130)