The Basic Technology Research Programme
Download
Report
Transcript The Basic Technology Research Programme
A new paradigm for virtual
screening
A Research Council’s Basic
Technology Research Programme
Background
• Cross research council endeavour
– administered by EPSRC
• Funding for research to create a new
technology
• Change the way we do science
• Underpin the future industrial base
Atom based modelling
QSAR & QSPR
• Almost all modelling techniques are based
on atomistic descriptions of molecules
• Although these techniques have been
successful over several decades, they have
disadvantages
– poor scaling characteristics
– lack of a solid physical justification, e.g. scoring functions
– interpretation difficult due to abstract nature of many descriptors
– tendency to produce high dimensional models
Improved
molecular modelling?
• Can we define a more parsimonious and
explicit description of molecules than has so
far been achieved using atomistic models?
– leading to better prediction AND a clearer
understanding of the properties of molecules
and how they arise
A non-atom based approach
• We are developing an alternative approach
in which molecules are described by their
surfaces
Benzodiazepine analogues
A non-atom based approach
• The approach is based on calculation of a
set of local properties at or near the
molecular surface
•
•
•
•
the local molecular electrostatic potential (MEP)
the local ionisation energy (LIE, IEL)
the local electron affinity (LEA, EAL)
the local polarisability (LP, L)
The local surface properties
Local Ionization Energy
IEL =
å -re
å r
i i
i=1, HOMO
i
i=1, HOMO
Computer-Chemie-Centrum
Local Polarizability
Local electron affinity - EAL
å
å
EAL = i=LUMO,norbs
NAOs
-re
i i
i=LUMO,norbs
ri
L =
Universität Erlangen-Nürnberg
å rq
j =1
NAOs
r 1j
qj
1
j
j
årq
j =1
j
Computer-Chemie-Centrum
Universität Erlangen-Nürnberg
1
j
j
j
Density due to a singly occupied atomic orbital j
Coulson population of atomic orbital j
Mean polarizability calculated for atomic orbital j
Computer-Chemie-Centrum
Universität Erlangen-Nürnberg
Calculation of the
surface properties
• Molecules defined as isodensity surfaces
– using semi-empirical AM1 electron density
– can also be defined using a shrink-wrap or a
marching cube algorithm
• Fitted to a spherical harmonic expansion
– the shape of the shrink-wrapped surface, or
– the four local properties
• MEP, LIE, LEA & LP
Describing surface shape:
spherical harmonic expansion
• The accuracy of the surface description is a
function of the order N of the expansion
• The greater N, the larger the computational
penalty
Advantages of this approach
• This gives a completely analytical description of
the molecule’s shape & the 4 local properties
– intermolecular binding properties & chemical reactivity
• Spherical harmonics can be truncated at low
orders for fast QSAR scans (HTS), fast
superposition of molecules & rapid calculation of
similarity indices
– for ligands (MW < 750), N = 6-8
– for peptides & proteins (MW > 5,000), N = 25-30
Putative resolutions for
in silico screening
• For ligands
N=6
MEP & LIE
MEP
Computer-Chemie-Centrum
• For receptors
N=25
IEL
Universität Erlangen-Nürnberg
Application to QSAR & QSPR
• Several classes of QSAR/QSPR descriptors can be
derived from the local properties, including:
– the spherical harmonics coefficients for constant order N
• the number of coefficients is invariant of the number of atoms
in a molecule
– the critical points for each surface property
• maxima, minima & saddle points
– the distribution of field intensities at the molecular surface
• four fields with local intensities varying between molecules
• sample using grid points?
– the surface integrals for each field
Public domain datasets
Small
Consensus Set of 74 Drug Molecules (diverse)
QSAR set (31 CoMFA steroids)
Medium
WDI subset (2,400 compounds)
Harvard Chembank dataset (2,000 compounds)
Large
WDI (50,000)
Maybridge (50,000)
Small molecule showing
tesselated surface
An example grid of
surface points
A grid is placed on this molecular surface in order to
reduce the number of surface points from 4038 to 55
Gradient flows & molecular
surface property graphs
• Characterize the behaviour of a
property f : S on a molecular
surface S, in terms of a directed graph
G on S derived from the gradient
vector field x = grad f(x)
• The molecular surface property graph
G is defined by
– Vertices (G) = fixed points of grad f
= critical points of f
– Edges (G) = stable and unstable
manifolds of the saddle points
Example Molecule
Allopurinol
Allopurinol RGB Surfaces
LIE encoded on Red channel
LEA encoded on Green Channel
LP or MEP encoded on Blue Channel
Critical points of allopurinol
8 maxima
7 minima
13 saddles
No. of maxima – no. of saddles + no. of
minima = Euler characteristic (S) = 2
Distribution based descriptors
34 descriptors were measured including
maximum field intensity
minimum field intensity
mean field intensity
range of field intensities
variance of field intensities
The Principal Components of the descriptors were
calculated
to provide a set of orthogonal descriptors derived from the local
properties at the molecular surface
Distribution of Allopurinol
Local Properties
Other distribution
based descriptors
Moments
1st – Mean
2nd – Variance
3rd – Skewness
4th – Kurtosis
> 4th – Higher moments as required
Overlapping Gaussians
Kernal density procedure
Correlation Matrix for
properties of allopurinol
LIE
LEA
LP
MEP
LIE
1
0.44
0.26
0.39
LEA
0.44
1
0.58
0.47
LP
0.26
0.58
1
-0.1
MEP
0.39
0.47
-0.1
1
Correlations of Local
Properties: Maybridge db
MEP
LIE
LEA
MEP
1
LIE
0.15
1
LEA
-0.12
0.18
1
LP
0.29
0.19
0.51
LP
1
Physical-Property Mapping
• Maybridge used as the “chemistry“ dataset
• Use the top six principal components to
train a 100 100 Kohonen net
(unsupervised training)
• 2,105 compounds selected from the World
Drug Index as real drugs used as the drug
dataset
Physical Property Map
Train
Kohonen
Net
“chemistry“
“Drugs“
“Drugs“
Physical Property Map: Drugs
Physical Property Map:
steroid hormones
Surface-integral models
P = å f V , IE , EA , ,
ntri
i
i =1
i
L
i
L
i
L
•P = target property
•Ai = area of triangle i
•ntri = number of triangles
i
L
A
i
Free energies & enthalpies of hydration, free
energies of solvation for n-octanol & chloroform
0
MSE
= 0.00
MUE
= 1.18
-20
Calculated Hsolv(H2O) (kcal mol-1)
Calculated Gsolv(H2O) (kcal mol-1)
-100
RMSD = 1.69
-80
-60
-40
-20
MSE
= 0.00
MUE
= 1.74
RMSD = 2.10
-40
-60
-80
0
0
-20
-40
-60
-80
-100
-100
-100
-80
Experimental Gsolv(H2O) (kcal mol-1)
-60
-40
-20
0
Experimental Hsolv(H2O) (kcal mol-1)
2
MSE
= 0.00
MUE
= 0.76
0
Calculated Gsolv(CHCl3) (kcal mol-1)
Calculated Gsolv(C8H18) (kcal mol-1)
2
RMSD = 1.00
-2
-6
MSE
= 0.00
MUE
= 0.48
RMSD = 0.74
-2
-4
-6
-8
-10
-12
-10
-14
-10
-6
-2
Experimental Gsolv(C8H18) (kcal mol-1)
2
-14
-12
-10
-8
-6
-4
-2
Experimental Gsolv(CHCl3) (kcal mol-1)
0
2
Surface comparison
Two different approaches:
1. Using spherical harmonic molecular
surfaces [J. Comp. Chem. 20(4) 383-395; Ritchie and
Kemp 2000; University of Aberdeen].
2. Partial molecular alignment via local
structure analysis [J. Chem. Inf. Comput. Sci.
40(2) 503-512 ; Robinson, Lyne and Richards 1999;
University of Oxford].
Voting pairs provide possible
local alignments
Try all possible voting pairs to produce a large number of
alignments. The choice of voting pairs can have a critical effect on
the quality of the surface alignment.
Example alignments
4
3
2
1
Pattern matching of surface
properties: RMSD = 0.75
A
B
ParaSurf v1.0
Surfaces
Isodensity Surfaces
Shrink Wrap
Marching Cube
Surfaces fit to Spherical Harmonics
Properties
MEP, LIE, LEA and LP
Encoded at points on the surface
Encoded as Spherical Harmonic Expansions
GRID Computing
ParaSurf compiled on
SGI IRIX
Windows
Linux (SUSE)
IBM AIX
Future Platforms
SUN Solaris
GRID enabling at Portsmouth, Southampton
and Oxford.
Summary
Critical features
Molecular
surfaces
Portsmouth
Pattern matching
on surfaces
Southampton/Oxford
Aberdeen
QM properties
on surface
Data reduction
and QSAR
Erlangen
Portsmouth
Compound
screening
Spherical harmonic
representation
Aberdeen
Conclusions
• Properties can be calculated at the surface of
molecules
• These properties can be RGB encoded
• The properties are local
• Descriptor sets derived from these properties can be
used for robust QSPR & QSAR models
• The algorithms will soon be available commercially
for use in virtual high throughput screening
ParaSurf – in silico Screening
Technology
• Basic Technology Funding for October
2003 to September 2004
– Proof of concept studies
– Consortia building networking
• Academic partners
–
–
–
–
–
University of Portsmouth
University of Erlangen
University of Southampton
University of Aberdeen
University of Oxford