Transcript Document
Targeting Druglike properties in
Chemical Libraries
David Winkler, Frank Burden, Mitchell Polley
Centre for Complexity in Drug Design
CSIRO Molecular Science and
Chemistry Department, Monash University
VICS
Complexity in Drug Design Group
Prof. Frank Burden - Scimetrics Ltd -
consultant to CSIRO
Dr. Mitchell Polley - CSS postdoctoral
fellow
Darryl Jones - CSS PhD top-up student Flinders University (Physics)
Prof. Dave Winkler - CSIRO Molecular
Science and Monash University/VICS
Overview of Project
Aims to develop a method for evolving a chemical
library of heterogeneous agents (molecules) using
'drug-like' fitness functions
Chemical space is vast (>1080 possibilities)
Method must explore drug-like chemical space and
identify islands of activity and novelty
Application in the discovery of novel bioactive
agents such as drugs, crop care products
Methodology applicable to design of new materials
and nanomachines using different fitness functions
Overview of Project
Steps…
Devise sparse, informative mathematical
representations of molecules
Devise sparse methods of selecting these for models
Use agent-based methods (Bayesian neural nets) to
map representations to properties and use models as
fitness functions
Develop methods for evolving chemicals using
mutation operators so that maximum chemical space
can be traversed
Evolve chemical libraries using drug-like fitness
functions
Highlights
Representations
Novel charge fingerprint descriptor
devised and tested
Theory of eigenvalue descriptors cracked
momentum space descriptor work started
Novel selectivity index developed
Sparse Descriptors
Many thousands of descriptors have been devised
(e.g. CoMFA fields, DRAGON)
Many are highly correlated with other descriptors
- contain the same information
Some (e.g. molecular weight) are informationpoor
Models using sparse descriptors can be more
predictive
We work to the premise that it is possible to devise
sparse, information-rich descriptors from which
suitable subsets could be drawn for a wide variety
of modelling problems
Charge fingerprints
These are widely applicable, easily computed
descriptors calculated by binning charges on atoms in
different environments
0.8
GM C
EEM TOP
EEM TOP
EEM TOP
EEM SEP
EEM SEP
EEM SEP
0.7
0.6
6C
1C
C
6C
1C
C
0.5
0.4
0.3
0.2
0.1
0.0
.0
+1
=>
.9
+0
=>
.8
+0
=>
.7
+0
=>
.6
+0
=>
.5
+0
=>
.4
+0
=>
.3
+0
=>
.2
+0
=>
.1
+0
=>
0.0
=>
0.0
<=
1
-0.
<=
2
-0.
<=
3
-0.
<=
4
-0.
<=
5
-0.
<=
6
-0.
<=
7
-0.
<=
8
-0.
<=
9
-0.
<=
0
-1.
<=
EEM-based property descriptors
Density Functional Theory (DFT) proposes that knowledge of
electron density allows computation of many other properties
Electronegativity equalization methods (Mortier, Bultinck and
others) is a rapid, approximate DFT method
All work to date has concentrated on charges or a few other
‘observables’.
Main strength will probably lie with calculation of other
molecular properties, when method is generalized and
parameterized for more atom types
Generalized eigenvalue matrices
1
H3C
O
C
0
1
a33
0
a11
b12
r12m
b13
m
r13
b1n
m
r
1n
2
O-
a11 1
1 a22
0
1
1
0
3
4
0
1
0
ann
b21
r21m
a22
b23
r23m
b2n
r2nm
b31
r31m
b32
r32n
a33
b3n
r3nm
bn1
rn1m
bn 2
rnm2
bn 3
rnm3
ann
Why do eigenvalue descriptors work?
Eigenvalue matrix
EEM matrix
21
1
r21
1
r m
n1
1
1
r21
22
1
rnm2
1
1
r1n
1
r2n
2n
1
1
1
1
1
0
a11
1
r12
1
r
1n
1
r21
a22
1
r2n
1
r3n
1
rn1
1
rn 2
ann
A = TLT'
AT = TL \
A-1 = TL-1 T' since T'=T-1 for an orthogonal transformation
i.e. inverse of A is related to the eigenvalues
Momentum space descriptors
the more interesting part of the
electron density distribution in
terms of biological activity is
located near to the k-space
origin. The corresponding rspace density distribution is
associated with the outermost
valence regions of the
molecule
k-space descriptions of
electron density are more
compact and simpler
Optimum Selectivity Index So
Highlights
Sparse feature selection
Automatic Relevance Determination (ARD) method
refined
Sparse Bayesian feature detection theory mastered
Linear sparse feature detection using an EM algorithm
and Jeffrey's prior
Nonlinear Bayesian feature detection achieved but
needs more work
Novel variable selection when number of descriptors
is much larger than the number of molecules in the
data set
Sparse Bayesian
variable selection
Descriptor
Highlights
Optimum nonlinear modelling
Bayesian regularized neural networks working well
Linear sparse feature detection and modelling
Nonlinear Bayesian feature detection and modelling
using radial basis function regression
Use of sparse Bayesian methods in neural networks
under study
Highlights
Models built
Blood-brain barrier partitioning
Drug intestinal absorption
Acute toxicity
Phase II metabolism - substrates and
inhibitors (Flinders medical school
collaboration) - SVM
Several drug target models - e.g. farnesyl
transferase
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Blood-brain barrier model
Topological descriptors- 3 hidden nodes
Training set 85 compounds, test set 21 compounds
Intestinal absorption QSAR model
Property-based descriptors- 5 hidden nodes- optimum model
Acute toxicity model
Burden index/binned charge descriptors 8 hidden nodes
Training set 450 compounds, external test set 53 compounds
Using SVM and EEM descriptors
to model phase II metabolism
UGT
Number of P ercent
Isoform chemicals in Substrat es
dat aset
P ercentof t est set predictedcorrectly
All
Chem icals
Substrat es
NonSubstrat es
1A1
174
39
85
81
88
1A3
156
76
89
94
67
1A4
156
55
83
78
94
1A6
161
41
67
72
64
1A7
65
40
79
57
92
1A8
104
78
77
95
40
1A9
176
65
80
86
67
1A10
147
50
80
86
74
2B4
131
31
83
75
87
2B7
196
65
64
73
36
2B15
125
42
67
60
71
2B17
53
45
80
70
100
COX 1 and 2 QSAR and selectivity
Built QSAR model for cyclooxygenase 1 and
2, and S0 using a large data set from Tom
Stockfisch at Accelrys (454 compounds
obtained from
http://www.accelrys.com/references/datasets/)
Used atomistic (A), Burden eigenvalue (B) and
charge fingerprint (C) descriptors together with
a Bayesian regularized neural net to build
model
Compared MLR with a Bayesian neural net
with 3 nodes in the hidden layer
COX 1 and 2 QSAR and selectivity
Selectivity of cyclooxygenase 1 and 2 inhibitors
Selectivity Index So QSAR Model
MLR
R2=0.77
Q2=0.69
BRANN (3 nodes)
R2=0.92
Q2=0.74