Transcript Document

Targeting Druglike properties in
Chemical Libraries
David Winkler, Frank Burden, Mitchell Polley
Centre for Complexity in Drug Design
CSIRO Molecular Science and
Chemistry Department, Monash University
VICS
Complexity in Drug Design Group
 Prof. Frank Burden - Scimetrics Ltd -
consultant to CSIRO
 Dr. Mitchell Polley - CSS postdoctoral
fellow
 Darryl Jones - CSS PhD top-up student Flinders University (Physics)
 Prof. Dave Winkler - CSIRO Molecular
Science and Monash University/VICS
Overview of Project
 Aims to develop a method for evolving a chemical
library of heterogeneous agents (molecules) using
'drug-like' fitness functions
 Chemical space is vast (>1080 possibilities)
 Method must explore drug-like chemical space and
identify islands of activity and novelty
 Application in the discovery of novel bioactive
agents such as drugs, crop care products
 Methodology applicable to design of new materials
and nanomachines using different fitness functions
Overview of Project
Steps…
 Devise sparse, informative mathematical
representations of molecules
 Devise sparse methods of selecting these for models
 Use agent-based methods (Bayesian neural nets) to
map representations to properties and use models as
fitness functions
 Develop methods for evolving chemicals using
mutation operators so that maximum chemical space
can be traversed
 Evolve chemical libraries using drug-like fitness
functions
Highlights
Representations
 Novel charge fingerprint descriptor
devised and tested
 Theory of eigenvalue descriptors cracked
 momentum space descriptor work started
 Novel selectivity index developed
Sparse Descriptors
 Many thousands of descriptors have been devised
(e.g. CoMFA fields, DRAGON)
 Many are highly correlated with other descriptors
- contain the same information
 Some (e.g. molecular weight) are informationpoor
 Models using sparse descriptors can be more
predictive
 We work to the premise that it is possible to devise
sparse, information-rich descriptors from which
suitable subsets could be drawn for a wide variety
of modelling problems
Charge fingerprints
 These are widely applicable, easily computed
descriptors calculated by binning charges on atoms in
different environments
0.8
GM C
EEM TOP
EEM TOP
EEM TOP
EEM SEP
EEM SEP
EEM SEP
0.7
0.6
6C
1C
C
6C
1C
C
0.5
0.4
0.3
0.2
0.1
0.0
.0
+1
=>
.9
+0
=>
.8
+0
=>
.7
+0
=>
.6
+0
=>
.5
+0
=>
.4
+0
=>
.3
+0
=>
.2
+0
=>
.1
+0
=>
0.0
=>
0.0
<=
1
-0.
<=
2
-0.
<=
3
-0.
<=
4
-0.
<=
5
-0.
<=
6
-0.
<=
7
-0.
<=
8
-0.
<=
9
-0.
<=
0
-1.
<=
EEM-based property descriptors
 Density Functional Theory (DFT) proposes that knowledge of
electron density allows computation of many other properties
 Electronegativity equalization methods (Mortier, Bultinck and
others) is a rapid, approximate DFT method
 All work to date has concentrated on charges or a few other
‘observables’.
 Main strength will probably lie with calculation of other
molecular properties, when method is generalized and
parameterized for more atom types

Generalized eigenvalue matrices
1
H3C
O
C
0
1
a33
0

a11

b12
r12m
b13
 m
r13
b1n
m

r
 1n
2
O-
a11 1

1 a22
0
1

1
0
3
4
0 

1 
0 

ann 

b21
r21m
a22
b23
r23m
b2n
r2nm
b31
r31m
b32
r32n
a33
b3n
r3nm
bn1 
rn1m 

bn 2 
rnm2 
bn 3 

rnm3 
ann 


Why do eigenvalue descriptors work?
Eigenvalue matrix
EEM matrix

21

 1
r21

 1
r m
 n1
 1

1
r21
22
1
rnm2
1
1
r1n
1
r2n
2n
1

1

1

1

1

0 

a11

1
r12

1
r
 1n
1
r21
a22
1
r2n
1
r3n
1 
rn1 

1 
rn 2 


ann 

 A = TLT'
AT = TL \
A-1 = TL-1 T' since T'=T-1 for an orthogonal transformation
i.e. inverse of A is related to the eigenvalues
Momentum space descriptors
 the more interesting part of the
electron density distribution in
terms of biological activity is
located near to the k-space
origin. The corresponding rspace density distribution is
associated with the outermost
valence regions of the
molecule
 k-space descriptions of
electron density are more
compact and simpler
Optimum Selectivity Index So
Highlights
Sparse feature selection
 Automatic Relevance Determination (ARD) method
refined
 Sparse Bayesian feature detection theory mastered
 Linear sparse feature detection using an EM algorithm
and Jeffrey's prior
 Nonlinear Bayesian feature detection achieved but
needs more work
 Novel variable selection when number of descriptors
is much larger than the number of molecules in the
data set
Sparse Bayesian
variable selection
Descriptor
Highlights
Optimum nonlinear modelling
 Bayesian regularized neural networks working well
 Linear sparse feature detection and modelling
 Nonlinear Bayesian feature detection and modelling
using radial basis function regression
 Use of sparse Bayesian methods in neural networks
under study
Highlights
Models built
 Blood-brain barrier partitioning
 Drug intestinal absorption
 Acute toxicity
 Phase II metabolism - substrates and
inhibitors (Flinders medical school
collaboration) - SVM
 Several drug target models - e.g. farnesyl
transferase
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Blood-brain barrier model
Topological descriptors- 3 hidden nodes
Training set 85 compounds, test set 21 compounds
Intestinal absorption QSAR model
Property-based descriptors- 5 hidden nodes- optimum model
Acute toxicity model
Burden index/binned charge descriptors 8 hidden nodes
Training set 450 compounds, external test set 53 compounds
Using SVM and EEM descriptors
to model phase II metabolism
UGT
Number of P ercent
Isoform chemicals in Substrat es
dat aset
P ercentof t est set predictedcorrectly
All
Chem icals
Substrat es
NonSubstrat es
1A1
174
39
85
81
88
1A3
156
76
89
94
67
1A4
156
55
83
78
94
1A6
161
41
67
72
64
1A7
65
40
79
57
92
1A8
104
78
77
95
40
1A9
176
65
80
86
67
1A10
147
50
80
86
74
2B4
131
31
83
75
87
2B7
196
65
64
73
36
2B15
125
42
67
60
71
2B17
53
45
80
70
100
COX 1 and 2 QSAR and selectivity



Built QSAR model for cyclooxygenase 1 and
2, and S0 using a large data set from Tom
Stockfisch at Accelrys (454 compounds
obtained from
http://www.accelrys.com/references/datasets/)
Used atomistic (A), Burden eigenvalue (B) and
charge fingerprint (C) descriptors together with
a Bayesian regularized neural net to build
model
Compared MLR with a Bayesian neural net
with 3 nodes in the hidden layer
COX 1 and 2 QSAR and selectivity
Selectivity of cyclooxygenase 1 and 2 inhibitors
Selectivity Index So QSAR Model
MLR
R2=0.77
Q2=0.69
BRANN (3 nodes)
R2=0.92
Q2=0.74