Slides - Stanford University

Transcript Slides - Stanford University

Probabilistic Roadmaps:
A Tool for Computing Ensemble
Properties of Molecular Motions
Serkan Apaydin, Doug Brutlag1
Carlos Guestrin, David Hsu2
Jean-Claude Latombe, Chris Varma
Computer Science Department
Stanford University
1 Department of Biochemistry, Stanford University
2 Computer Science Department, University of North Carolina
Goal of our Research
Develop efficient computational representations
and algorithms to study molecular pathways for
protein folding and ligand-protein binding
Protein folding  RECOMB ’02
Ligand-protein binding  ECCB ‘02
Acknowledgements
People:
Leo Guibas
Michael Levitt, Structural Biology
Itay Lotan
Vijay Pande, Chemistry
Fabian Schwarzer
Amit Singh
Rohit Singh
Funding:
NSF-ITR ACI-0086013
Stanford’s Bio-X and Graduate Fellowship programs
Analogy with Robotics
Configuration Space
Approximate the free space
by random sampling
 Probabilistic Roadmaps
Probabilistic Roadmap
free space
[Kavraki, Svetska, Latombe,Overmars, 95]
Probabilistic Completeness
The probability that a roadmap fails to
correctly capture the connectivity of the
free space goes to 0 exponentially in the
number of milestones (~ running time).
 Random sampling is convenient incremental
scheme for approximating the free space
Computed Examples
Biology  Robotics
Energy field, instead of joint control
Continuous energy field, instead of
binary free and in-collision spaces
Multiple pathways, instead of single
collision-free path
Potentially many more degrees of
freedom
Relation to real world is more complex
Initial Work
[Singh, Latombe, Brutlag, 99]
Study of ligand-protein
binding
Probabilistic roadmaps
with edges weighted by
energetic plausibility
Search of most plausible paths
Initial Work
[Singh, Latombe, Brutlag, 99]
Study of ligand-protein
binding
energy
Probabilistic roadmaps
with edges weighted by
energetic plausibility
Search of most plausible paths
Study of energy profiles along such paths
Catalytic
Site
Initial Work
[Singh, Latombe, Brutlag, 99]
Study of ligand-protein
binding
Probabilistic roadmaps
with edges weighted by
energetic plausibility
Search of most plausible paths
Study of energy profiles along such paths
Extensions to protein folding
[Song and Amato, 01] [Apaydin et al., 01]
New Idea:
Capture the stochastic nature of molecular
motion by assigning probabilities to edges
vi
Pij
vj
Why is this a good idea?
1) We can approximate Monte Carlo
simulation as closely as we wish
2) Unlike with MC simulation, we avoid
the local-minima problem
3) We can consider all pathways in the
roadmap at once to compute ensemble
properties
Edge probabilities
 exp(  Eij / k BT )
, if Eij  0;

Ni

Follow Metropolis criteria: Pij  
 1 , otherwise.
 N i
vi
Self-transition probability:
Pii  1   Pij
j i
Pii
Pij
vj
Stochastic Roadmap Simulation
Pij
S
Stochastic simulation on roadmap and Monte Carlo
simulation converge to same Boltzmann distribution
Problems with
Monte Carlo Simulation
 Much time is wasted in local minima
 Each run generates a single pathway
Solution
Pij
Treat roadmap as a Markov chain and use
the First-Step Analysis tool
Example #1:
Probability of Folding pfold
HIV integrase
[Du et al. ‘98]
1- pfold
pfold
“We stress that we do not suggest using pfold as a
transition coordinate for practical purposes as it is
Folded set
Unfolded set very computationally intensive.”
Du, Pande, Grosberg, Tanaka, and Shakhnovich “On the Transition Coordinate
for Protein Folding” Journal of Chemical Physics (1998).
First-Step Analysis
U: Unfolded set





F: Folded set
One linear equation per node
Solution gives pfold for all nodes l
k
No explicit simulation run
j
Pik
Pil
All pathways are taken
Pij into account
m
Pim
Sparse linear system
i
Pii
Let fi = pfold(i)
After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm
=1
=1
In Contrast …
Computing pfold with MC simulation requires:
 Performing many MC simulation runs
 Counting the number of times F is
attained first
for every conformation of interest:
Computational Tests
• 1ROP (repressor of
primer)
• 2 a helices
• 6 DOF
• 1HDD (Engrailed
homeodomain)
• 3 a helices
• 12 DOF
H-P energy model with steric clash exclusion [Sun et al., 95]
Correlation with MC Approach
1ROP
Correlation with MC Approach
1HDD
Computation Times (1ROP)
Monte Carlo:
49 conformations
Over 11 days of
computer time
Over 106 energy
computations
Roadmap:
5000 conformations 1 - 1.5 hours of
computer time
~15,000 energy
computations
~4 orders of magnitude speedup!
Example #2:
Ligand-Protein Interaction
Computation of escape time
from funnels of attraction
around potential binding sites
(funnel = ball of 10A rmsd)
Computing Escape Time with Roadmap
l
k
j
Pil
Pik
Pij
i
Pii
m
Pim
Funnel of Attraction
ti = 1 + Pii ti + Pij tj+ Pik tk + Pil tl + Pim tm
=0
(escape time is measured as number of steps
of stochastic simulation)
Similar Computation Through
Simulation [Sept, Elcock and
McCammon `99]
10K to 30K independent simulations
Applications
1) Distinguishing catalytic site: Given
several potential binding sites, which
one is the catalytic site?
Complexes Studied
ligand
protein # random
nodes
#
DOFs
oxamate
1ldm
8000
7
Streptavidin
1stp
8000
11
Hydroxylamine
4ts1
8000
9
COT
1cjw
8000
21
THK
1aid
8000
14
IPM
1ao5
8000
10
PTI
3tpi
8000
13
Distinction Based on Energy
Protein
Bound
state
Best potential
binding site
1stp
-15.1
-14.6
4ts1
-19.4
-14.6
3tpi
-25.2
-16.0
1ldm
-11.8
-13.6
1cjw
-11.7
-18.0
1aid
-11.2
-22.2
1ao5
-7.5
-13.1
Able to
distinguish
catalytic site
Not able
(kcal/mol)
Distinction Based on Escape Time
Protein
1stp
4ts1
3tpi
1ldm
1cjw
1aid
1ao5
Bound
state
3.4E+9
3.8E+10
1.3E+11
8.1E+5
5.4E+8
9.7E+5
6.6E+7
Best potential
binding site
1.1E+7
1.8E+6
5.9E+5
3.4E+6
4.2E+6
1.6E+8
5.7E+6
Able to
distinguish
catalytic site
Not able
(# steps)
Applications
1) Distinguishing catalytic site
2) Computational mutagenesis
GLN-101
Loop
ARG-106
ASP-195
HIS-193
+
+
CH3
O C
C
ASP-166
NADH
Some amino acids are
deleted entirely,
replaced by other
amino acids, or
sidechains altered
O
O
+
ARG-169
Chemical environment of LDH-NADH-substrate complex (pyruvate)
(catalyzes conversion of pyruvate to lactate in the presence of NADH
Binding of Pyruvate to LDH
GLN-101
Loop
CH3
ARG-106
O C
C
O
ASP-195
O
HIS-193
+
+
THR-245
ASP-166
NADH
+
ARG-169
Results
Mutant
Escape Time Change
Wildtype
3.216E6
GLN-101
N/A
Loop
ARG-106
ASP-195
HIS-193
+
+
CH3
O C
THR-245
C
ASP-166
NADH
O
O
+
ARG-169
Results
Mutant
Escape Time Change
Wildtype
3.216E6
His193  Ala
Arg106  Ala
4.126E2
GLN-101
N/A

Loop
ALA-106
CH3
ASP-195
ALA-193
O C
ASP-166
C
NADH
O
O
+
ARG-169
Results
Mutant
Escape Time
Wildtype
3.216E6
His193  Ala
Arg106  Ala
4.126E2
Change
GLN-101
N/A

Loop
ARG-106
ASP-195
HIS-193
His193  Ala
3.381E3
+
+
CH3
O C
GLY-245

Arg106  Ala
2.550E2

Asp195  Asn
5.221E7

Gln101  Arg
1.669E6
No change
Thr245  Gly
4.607E5

C
ASP-166
NADH
O
O
+
ARG-169
Conclusion
Probabilistic roadmaps are a promising
computational tool for studying ensemble
properties of molecular pathways
Current and future work:
 Better kinetic/energetic models
 Experimentally verifiable tests
 Non-uniform sampling strategies
 Encoding MD simulation
Stochastic Roadmap Simulation
vs
S
vg
Stochastic simulation on a roadmap and MC simulation converge to the
same distribution p (Boltzman):
For any set S, e>0, d>0, g>0, there exists N such that a roadmap with
N milestones has error bounded by:
p ( S )(1  d )  e  pˆ ( S )  p ( S )(1  d )  e
with probability at least 1-g.
Ligand-Protein Modeling
x,y,z

• DOF = 10
–
–
–
–





3 coordinates to position root atom;
2 angles to specify first bond;
Angles for all remaining non-terminal atoms;
Bond angles are assumed constant;
• Protein assumed rigid
[Singh, Latombe and Brutlag `99]
Energy of Interaction
Energy = van der Waals interaction (Ev)
+
electrostatic interaction (Ec)
Ev = 0.2[(R0/Rij)12 - 2(R0/Rij)6 ] Ec = 332 QiQj/(eRij)
Ec
Ev
Rij
Rij
Solvent Effects
Ec = 332 QiQj/(eRij)
• Is only valid for an infinite medium of uniform
dielectric;
• Dielectric discontinuities result in induced surface
charges;
Solution: Poisson-Boltzman equation

[e(r)
. (r)] - e(r)k(r)2sinh([(r)] + 4prf(r)/kT = 0

 Use Delphi [Rocchia et al `01]
 Finite Difference solution is based on discretizing the
workspace into a uniform grid.

Slides - Stanford University

Transcript Slides - Stanford University

Directory