Motion of bio

Download Report

Transcript Motion of bio

Application of Probabilistic
Roadmaps to the Study of
Protein Motion
Proteins
 Proteins are the workhorses of all living organisms
 They perform many vital functions, e.g:
•
•
•
•
•
•
Catalysis of reactions
Transport of molecules
Building blocks of muscles
Storage of energy
Transmission of signals
Defense against intruders
 They are large molecules (few 100s to several 1000s of
atoms)
 They are made of building blocks (amino acids) drawn
from a small “library” of 20 amino-acids
 They have an unusual kinematic structure: long serial
linkage (backbone) with short side-chains
Protein Sequence
(residue i-1)
O
O
N
N
N
N
O
O
 Long sequence of amino-acids (dozens to thousands),
also called residues
 Dictionary of 20 amino-acids (several billion years old)
Central Dogma
of Molecular Biology
Physiological conditions:
aqueous solution, 37°C, pH 7,
atmospheric pressure
Molecular motion is an essential
process of life
Mad cow disease is
caused by mis-folding
Drug molecules act by
binding to proteins
So, studying molecular motion is of
critical importance in molecular biology
However, few tools are available
Computer simulation:
- Monte Carlo simulation
- Molecular Dynamics
Stanford BioX cluster
NMR spectrometer
Motion occurs at very
different frequencies
HIV-1 protease
Low-frequency motions (diffusive motions) are more
directly related to protein functions
Two Major Drawbacks of
MD and MC Simulation
1) Each simulation run
yields a single pathway,
while molecules tend to
move along many
different pathways
 Interest in ensemble
properties
Unfolded (denatured) state
Intermediate
states
Many pathways
Folded (native) state
Two Major Drawbacks of
MD and MC Simulation
1) Each simulation run
yields a single pathway,
while molecules tend to
move along many
different pathways
2) Each simulation run
tends to waste much
time in local minima
Kinematic Models
 Atomistic model: The position of each atom is
defined by its coordinates in 3-D space
(x5,y5,z5)
(x1,y1,z1)
(x4,y4,z4)
(x2,y2,z2)
(x3,y3,z3)
(x7,y7,z7)
(x6,y6,z6)
(x8,y8,z8)
p atoms  3p parameters
Drawback: The bond structure is not taken into account
Kinematic Models
 Linkage model: The protein consists of atoms
connected by rotatable bonds
Resi
Resi+1
Resi+2
O
C

N
C

N
C’
O


C
C
Resi+3
O
C

C’
N
C

N
C’
O


C
C
C’
Roadmap-Based Representation
 Compact representation of many motion pathways
 Coarse resolution relative to MC and MD simulation
( only low-frequency motions are represented)
 Efficient algorithms for analyzing multiple pathways
Initial Work
A.P. Singh, J.C. Latombe, and D.L. Brutlag.
A Motion Planning Approach to Flexible Ligand Binding.
Proc. 7th ISMB, pp. 252-261, 1999
 Study of ligand-protein binding
 The ligand is a small flexible molecule, but the protein
is assumed rigid
 A fixed coordinate system P is
attached to the protein and a
moving coordinate system L is
defined using three bonded atoms
in the ligand
 A conformation of the ligand is
defined by the position and
orientation of L relative to P
and the torsional angles of the ligand
Roadmap Construction
(Node Generation)
 The nodes of the roadmap are generated by sampling
conformations of the ligand uniformly at random in the
parameter space (around the protein)
 The energy E at each sampled conformation is computed:
Waals
E
=
Einteraction =
Einternal =
Einteraction + Einternal
electrostatic + van der Waals potential
Snon-bonded pairs of atoms electrostatic + van der
Roadmap Construction
(Node Generation)
 The nodes of the roadmap are generated by sampling
conformations of the ligand uniformly at random in the
parameter space (around the protein)
 The energy E at each sampled conformation is computed:
Waals
E
=
Einteraction =
Einternal =
Einteraction + Einternal
electrostatic + van der Waals potential
Snon-bonded pairs of atoms electrostatic + van der
 A sampled conformation is retained as a node of the
roadmap with probability:
P=
0
Emax-E
Emax-Emin
1
if E > Emax
if Emin  E  Emax
if E < Emin
 Denser distribution of nodes in low-energy regions of
conformational space
Roadmap Construction
(Edge Generation)
q
qi
qi+1
q’
 Each node is connected to its closest neighbors by
straight edges
 Each edge is discretized so that between qi and qi+1 no
atom moves by more than some ε (= 1Å)
E
Emax
 If any E(qi) > Emax , then the edge is rejected
Roadmap Construction
(Edge Generation)
q
qi
q’
qi+1
 Any two nodes closer apart than some threshold
distance are connected by a straight edge
 Each edge is discretized so that between qi and qi+1 no
atom moves by more than some ε (= 1Å)
 If all E(qi)  Emax , then the edge is retained and is
assigned two weights w(qq’) and w(q’q)
where:
w(q  q') =
 -ln(P[q  q
i
i
i+1
])
Heuristic measure
of energetic difficulty
or moving from q to q’
e-(Ei+1 -Ei )/kT
P[qi  qi+1 ] = -(Ei+1 -Ei )/kT
e
 e-(Ei-1 -Ei )/kT
(probability that the ligand moves from qi to qi+1 when it
is constrained to move along the edge)
Querying the Roadmap
 For a given goal node qg (e.g., binding conformation),
the Dijkstra’s single-source algorithm computes the
lowest-weight paths from qg to each node (in either
direction) in O(N logN) time, where N = number of
nodes
 Various quantities can then
be easily computed in O(N)
time, e.g., average weights
of all paths entering qg and
of all paths leaving qg
(~ binding and dissociation
rates Kon and Koff)
Protein: Lactate dehydrogenase
Ligand: Oxamate (7 degrees of freedom)
Computation of Potential Binding
Conformations
1) Sample many (several 1000’s) ligand’s
conformations at random around protein
active site
2) Repeat several times:
 Select lowest-energy
conformations that are
close to protein surface
 Resample around them
3) Retain k (~10)
lowest-energy
conformations whose
centers of mass are at
least 5Å apart
lactate dehydrogenase
Experiments on 3 Complexes
1) PDB ID: 1ldm
Receptor: Lactate Dehydrogenase (2386 atoms, 309 residues)
Ligand: Oxamate (6 atoms, 7 dofs)
2) PDB ID: 4ts1
Receptor: Mutant of tyrosyl-transfer-RNA synthetase (2423
atoms, 319 residues)
Ligand: L- leucyl-hydroxylamine (13 atoms, 9 dofs)
3) PDB ID: 1stp
Receptor: Streptavidin (901 atoms, 121 residues)
Ligand: Biotin (16 atoms, 11 dofs)
Results for 1ldm


Some potential binding sites have slightly lower energy than the active site
 Energy is not a discriminating factor
Average path weights (energetic difficulty) to enter and leave binding site
are significantly greater for the active site
 Indicates that the active site is surrounded by an energy barrier that
“traps” the ligand
Energy
Potential
binding
site
Active
site
Potential
binding
site
Conformation
Application of Roadmaps
to Protein Folding
N.M. Amato, K.A. Dill, and G. Song. Using Motion Planning to Map
Protein Folding Landscapes and Analyze Folding Kinetics of Known
Native Structures. J. Comp. Biology, 10(2):239-255, 2003
 Known native state
 Degrees of freedom: φ-ψ angles
 Energy: van der Waals, hydrogen bonds,
hydrophobic effect
 New idea: Sampling strategy
 Application: Finding order of SSE formation
Sampling Strategy
(Node Generation)
 High dimensionality
 non-uniform sampling
 Conformations are sampled
using Gaussian distribution
around native state
 Conformations are sorted
into bins by number of
native contacts (pairs of
C atoms that are close
apart in native structure)
 Sampling ends when all bins
have minimum number of
conformations
 “good” coverage of
conformational space
Application: Order of Formation
of Secondary Structures
 The lowest-weight path is extracted
from each denatured conformation to
the folded one
 The order of formation of SSE’s is
computed along each path
 The formation order that appears the
most often over all paths is considered
the SSE formation order of the protein
Method
1) The contact matrix showing the time
step when each native contact appears
is built
Protein CI2
(1 + 4 )
60
5
Protein CI2
(1 + 4 )
The native contact between residues 5 and 60 appears at step 216
Method
1) The contact matrix showing the time
step when each native contact appears
is built
2) The time step at which a structure
appears is approximated as the average
of the appearance time steps of its
contacts
 forms at time step 122 (II)
3 and 4 come together at 187 (V)
2 and 3 come together at 210 (IV)
1 and 4 come together at 214 (I)
 and 4 come together at 214 (III)
Protein CI2
(1 + 4 )
Method
1) The contact matrix showing the time
step when each native contact appears
is built
2) The time step at which a structure
appears is approximated as the average
of the appearance time steps of its
contacts
Comparison with
Experimental Data
CI2
SSE’s
roadmap size
1+4
5126, 70k
3
1+4
1+5
5471, 104k
7975, 104k
8357, 119k
Stochastic Roadmaps
M.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, J.C. Latombe and C. Varma.
Stochastic Roadmap Simulation: An Efficient Representation and Algorithm
for Analyzing Molecular Motion. J. Comp. Biol., 10(3-4):257-281, 2003
New Idea: Capture the stochastic nature of molecular
motion by assigning probabilities to edges
vi
Pij
vj
Edge probabilities
 exp(-ΔEij/kT)
, if ΔEij >0;

Ni

Follow Metropolis criteria: Pij = 
 1 , otherwise.
 Ni

vi
Self-transition probability:
Pii =1- Pij
Pii
Pij
ji
[Roadmap nodes are sampled uniformly at random and energy profile
along edges is not considered]
vj
Stochastic Roadmap Simulation
Pij
V
Stochastic roadmap simulation and Monte Carlo simulation
converge to the Boltzmann distribution, i.e., the number of
-E/kT
e
dV
times SRS is at a node in V converges toward V
when the number of nodes grows (and they are uniformly
distributed)
Roadmap as Markov Chain
i
Pij
j
 Transition probability Pij depends only on i and j
Example #1:
Probability of Folding pfold
1- pfold
Unfolded state
pfold
Folded state
First-Step Analysis
U: Unfolded state





F: Folded state
One linear equation per node
l
Solution gives pfold for all nodes
No explicit simulation runk
j
Pik
Pil
All pathways are taken into account
Pij
m
Sparse linear system
i
Pim
Pii
Let fi = pfold(i)
After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm
=1
=1
Number of Self-Avoiding Walks
on a 2D Grid
1, 2, 12, 184, 8512, 1262816,
575780564, 789360053252,
3266598486981642,
(10x10) 41044208702632496804,
(11x11) 1568758030464750013214100,
(12x12) 182413291514248049241470885236 > 1028
http://mathworld.wolfram.com/Self-AvoidingWalk.html
In contrast …
Computing pfold with MC simulation requires:
For every conformation q of interest
 Perform many MC simulation runs from q
 Count number of times F is attained first
Computational Tests
• 1ROP (repressor of
primer)
• 2  helices
• 6 DOF
• 1HDD (Engrailed
homeodomain)
• 3  helices
• 12 DOF
H-P energy model with steric clash exclusion [Sun et al., 95]
Correlation with MC Approach
1ROP
pfold for ß hairpin
Immunoglobin binding protein
(Protein G)
Last 16 amino acids
Cα based representation
Go model energy function
42 DOFs
[Zhou and Karplus, `99]
Computation Times (ß hairpin)
Monte Carlo (30 simulations):
~10 hours of
computer time
Over 107 energy
computations
2000 conformations 23 seconds of
computer time
~50,000 energy
computations
1 conformation
Roadmap:
~6 orders of magnitude speedup!
Using Path Sampling to Construct
Roadmaps
N. Singhal, C.D. Snow, and V.S. Pande. Using Path Sampling to Build
Better Markovian State Models: Predicting the Folding Rate and
Mechanism of a Tryptophan Zipper Beta Hairpin,
J. Chemical Physics, 121(1):415-425, 2004
New idea:
Paths computed with Molecular Dynamics
simulation techniques are used to create the
nodes of the roadmap
 More pertinent/better distributed nodes
 Edges are labeled with the time needed to
traverse them
Sampling Nodes from Computed
Paths (Path Shooting)
~dt
F
U
Sampling Nodes from Computed
Paths (Path Shooting)
tij
i
j
pij
F
U
Example: Langevin dynamics equation of motion
dx
is Fext -mγ
+R=0 where R is a Gaussian random force
dt
Node Merging
If two nodes are closer apart than some e, they
are merged into one and merging rules are
applied to update edge probabilities and times
1
P12, t12
P14, t14
3
2
3
1
P12’, t12’
2’
4
5
P12’ = P12 + P14
t12’ = P12xt12 + P14xt14
5
Node Merging
If
two
nodes are closer
apart than
some e, they
Approximately
uniform
distribution
are merged into one and merging rules are
of nodes over the reachable subset of
applied to update edge probabilities and times
conformational space
1
P12, t12
P14, t14
3
2
3
1
P12’, t12’
2’
4
5
P12’ = P12 + P14
t12’ = P12xt12 + P14xt14
5
Application: Computation of MFPT
 Mean First Passage Time: the average time
when a protein first reaches its folded state
 First-Step Analysis yields:
 MPFT(i) = Sj Pij x (tij + MPFT(j))
 MPFT(i) = 0 if i  F
 Assuming first-order kinetics, the probability
that a protein folds at time t is:
Pf (t) = 1 - e-rt
where r is the folding rate

 MFPT =
  P (t)  tdt
0
f
=1/r
Computational Test
 12-residue tryptophan zipper beta hairpin (TZ2)
 Folding@Home used to generate trajectories
(fully atomistic simulation) ranging from 10 to
450 ns
 1750 trajectories (14 reaching folded state)
  22,400-node roadmap
 MFPT ~ 2-9 ms, which is similar to experimental
measurements (from fluorescence and IR)
Conclusion
 Probabilistic roadmaps are a recent, but promising tool
for exploring conformational space and computing
ensemble properties of molecular pathways
 Current/future research:
• Better sampling strategies able to handle more
complex molecular models (protein-protein binding)
• More work to include time information in roadmaps
• More thorough experimental validation to compare
computed and measured quantitative properties