Transcript Document
Computer simulations in
drug design, and the GRID
Dr Jonathan W Essex
University of Southampton
Computer simulations?
• Molecular dynamics
– Solve F = ma for particles in system
– Follow time evolution of system
– Average system properties over time
• Monte Carlo
– Perform random moves on system
– Accept or reject move on the basis of change in
energy
– No time information
Drug action
• Administration
Free energy simulations
– Oral, intravenous…
– pKa, solubility
• Transport and permeation
– Passage through membranes
– Partition coefficient, permeability
• Protein binding
– Binding affinity, specificity
– Allosteric effects
• Metabolism
– P450 enzymes
– Catalysis
Digital filtering
Parallel tempering
Free energy simulations
Free QM/MM
energy simulations
Free energy simulations
Structure generation
Modelling
Three main areas of work
• Protein-ligand binding
– Structure prediction
– Binding affinity prediction
• Membrane modelling and small molecule
permeation
– Bioavailability
– Coarse-grained membrane models
• Protein conformational change
– Induced-fit
– Large-scale conformational change
Small molecule docking
• Flexible protein docking
• Most successful structure
with experiment
(transparent)
• Most successful structure,
experiment, and
isoenergetic mode
Replica-exchange free energy
1
exp EB j EB i E A j E A i rand0,1
kT
• Free energy methods that are applied between
exchanges are the same as normal
• Exchanges require little extra computational cost
Membrane permeation
Z-constraint F( z, t )
z2
Free Energies G (z ) F ( z ) dz
t
z1
z1
z2
z
Drugs: -blockers
alprenolol
atenolol
pindolol
• not “exotic” chemical groups
• not too big
• similar structure, pKa and weight, but different lipophilicity and oral absorption
• much experimental data about Caco-2 permeabilities
Reversible digitally filtered MD
• Conformational change associated with low
frequency vibrations
• Use digital filter applied to atomic velocities to
amplify low frequency motion
RDFMD
• Effect of a quenching filter
Comb-e-Chem
• EPSRC Grid computing project
• Main objectives:
– Equipment on the Grid – high-throughput small
molecule crystallography
– E-lab – pervasive computing, provenance
– Structure modelling and property prediction
• Automatic calculations performed on deposition
of a new structure to the database
• Distributed (Grid) computing
Distributed computing
•
•
•
•
Heterogeneous computers – personal PCs
Rather unstable
Network often slow
Suited for embarrassingly parallel calculations
– SETI@Home, Folding@Home etc.
– E-malaria (JISC)
• Poor-man’s resource
– Cheap
– Large amount of power available
Condor
• Condor project started in 1988 to harness the
spare cycles of desktop computers
• Available as a free download from the
University of Wisconsin-Madison
• Runs on Linux, Unix and Windows. Unix
software can be compiled for Windows using
Cygwin
• For more information see
http://www.cs.wisc.edu/condor/
Distributed computing and replica
exchange
• Run multiple simulations is parallel
• Use different conditions in different
simulations to enhance conformational
change (e.g. temperature)
300 K
320 K
340 K
360 K
300 K
320 K
340 K
360 K
380 K
400 K
300 K
1
320 K
340 K
360 K
380 K
400 K
300 K
320 K
340 K
360 K
380 K
400 K
1
1
1
1
1
1
300 K
320 K
340 K
360 K
380 K
400 K
2
2
2
2
2
2
300 K
320 K
340 K
360 K
380 K
400 K
3
3
3
2
3
2
300 K
320 K
340 K
360 K
380 K
400 K
5
4
3
2
3
4
Catch-up
cluster
300 K
320 K
340 K
360 K
380 K
400 K
5
4
3
2
3
4
Catch-up
cluster
300 K
320 K
340 K
360 K
380 K
400 K
5
4
4
4
4
4
Activity of nodes over the last day
Red spots
Yellow
showBlue
bars
when
showrepresent a
bars
the owner
whenofthe
the
job
node
isrunning
moved an even
node
Green
bars represent a
isover
usingto
itthe catchup
iteration
node
running
an odd
(interrupting our
cluster
job)
iteration
Activity of nodes since the start of the
simulation
University network
Condor master server failed!
crashed!
Catchup cluster
Catchup cluster was a
redesignated to six
large collection of nonfast, dedicated nodes.
One slow or dedicated nodes.
interrupted iteration
delays large parts of
the simulation
Current paradigm for
biomolecular simulation
• Target selection: literature based; interesting
protein/problem
• System preparation: highly interactive, slow,
idiosyncratic
• Simulation: diversity of protocols
• Analysis: highly interactive, slow,
idiosyncratic
• Dissemination: traditional – papers, talks,
posters
• Archival: archive data… and then lose the
tape!
Managing MD data: BioSimGRID
Application
Distributed Query
2nd Level Metadata –
describing the results of
generic analyses
Analyse Data
1st Level Metadata –
describing the simulation
data
Simulation Data
Distributed Raw Data
York
Nottingham
Birmingham
Oxford
Bristol
London
Southampton
www.biosimgrid.org
Distributed database environment
Software tools for interrogation and data-mining
Generic analysis tools
Annotation of simulation data
Comparative simulations
• Increase significance of results
– Effect of force field
– Simulation protocol
• Long simulations and multiple simulations
• Biology emerges from the comparisons
– Very easy to over-interpret protein simulations
– What’s noise, and what’s systematic?
Test Application: Comparison of Active
Site Dynamics
•
OMPLA – bacterial outer membrane lipase; GROMACS;
Oxford
•
AChE – neurotransmitter degradation at synapses; NWChem;
UCSD (courtesy of Andrew McCammon)
•
Both have catalytic triad at active site – compare
conformational dynamics
BioSimGRID prototype
WEB
HTTP(S)
Web Portal Environment
Python Environment
Apache/SSL
Python
TCP/IP
Applications Server
SQL
Editor
AAA
Module
HTML
Analysis
Generator Tool
Traj Query
Tool
Video/Img
Generator
Other
Database Access: DBI / PythonDB2 / PortalLib
TCP/IP
Prototype: Summer 2003
(UK e-science All-Hands
meeting)
DB2 Cluster
Revised structure
BioSimGrid
Config. Files
Hybrid Storage
Analysis Toolkit
RMSD
Simulation
RMSF
database
Distances
Result
Files
Internal Angles
Volume
Flat file
Surface
4. Analysis
User
input
3. Data- on-demand Query
5. View Result
Metadata
2. Generation of Metadata
Trajectory
1. Submission of trajectory
Visualisation
Tools
Example of script use (distance matrix)
FC = FrameCollection(‘2,5-8’)
myDistanceMatrix = DistanceMatrix(FC)
myDistanceMatrix.createPNG()
myDistanceMatrix.createAGR()
•Script Calculates Distance Matrix
•User has requested result as PNG
•Grace project file was also produced
www.biosimgrid.org
Future directions: Multiscale
biomolecular simulations
QM
drug binding
Bristol
Southampton
protein motions
Oxford
drug diffusion
London
Membrane bound enzymes – major drug targets (cf. ibruprofen, anti-depressants,
endocannabinoids); gated access to active site coupled to membrane fluctuations
Complex multi-scale problem: QM/MM; ligand binding; membrane/protein
fluctuations; diffusive motion of substrates/drugs in multiple phases
Need for integrated simulations on GRID-enabled HPC resources
Computational
challenges
Linux
cluster
HPCx
IntBioSim
BioSimGRID database
www.biosimgrid.org
Need to integrate HPC, cluster & database resources
Funding: bid to BBSRC under consideration…
Acknowledgements
My group:
Stephen Phillips
Richard Taylor
Daniele Bemporad
Christopher Woods
Robert Gledhill
Stuart Murdock
My funding:
BBSRC, EPSRC, Royal Society,
Celltech, AstraZeneca, Aventis,
GlaxoSmithKline
My collaborators:
Mark Sansom, Adrian Mulholland, David
Moss, Oliver Smart, Leo Caves, Charlie
Laughton, Jeremy Frey, Peter Coveney,
Hans Fangohr, Muan Hong Ng, Kaihsu
Tai, Bing Wu, Steven Johnston, Mike
King, Phil Jewsbury, Claude Luttmann,
Colin Edge