presentation source
Download
Report
Transcript presentation source
Advances in Electronic Property-Encoded
Molecular Shape Descriptors
C. Matthew Sundling and Curt M. Breneman*
Department of Chemistry
Rensselaer Polytechnic Institute, Cogswell Lab 319A, 110 8th St., Troy,
NY 12180, Fax: 518 276-4887
{sundlm,brenec}@rpi.edu
A Progress Report from NSF Project DDASSL
(Drug Design and Semi-Supervised Learning)
(KDD/ IIS-9979860)
Project DDASSL –Drug Design And SemiSupervised Learning (NSF/KDD IIS-9979860)
Collaborative effort between three research groups in different
departments at RPI.
Goals include the rapid development of bioactivity screening
and QSAR models through data mining of large databases.
To achieve these goals, the DDASSL group is developing novel
descriptors, feature selection techniques and classification
methods.
Methods are currently being tested on smaller datasets to
determine their reliability before proceeding to larger datasets.
www.drugmining.com
The RECON/TAE Reconstruction Method: TAE
“Whole-Molecule” Descriptors
TAE Descriptors: Molecular surfaces and surface property
distributions without explicit shape information.
– C.M. Breneman and M. Rhem, J. Comp. Chem, Vol 18:2, 182-197 (1997)
Bader’s AIM approach is used to generate a large library of
structurally-distinct atom types.
A connectivity-based selection algorithm is used to assign atom
types to 2-D connection tables.
RECON molecular properties may be used to rapidly generate
Electron Density-Derived molecular TAE descriptors and
Wavelet Coefficient Descriptors (WCDs) for large datasets.
Electron Density-Derived TAE Descriptors
1 ) Surface properties are encoded on 0.002 e/au3 surface
Breneman, C.M. and Rhem, M., J. Comp. Chem., 1997,18(2), p. 182-197
2 ) Histograms or wavelet encoded of surface properties give
TAE property descriptors
Histograms
PIP (Local Ionization Potential)
Wavelet Coefficients
QSAR Modeling with 3D Electronic Property Descriptors
3D Molecular descriptors: To Align or not to Align?
– Pro: Alignment allows the capture of the spatial orientation of
electronic properties. (Pharmacophore information)
– Con: Alignment impedes automated 3D descriptor generation.
•
CoMFA alignment rules do not perform well in unsupervised
operations.
•
3D whole-molecule descriptors can be effective in QSAR/QSPR
modeling without utilizing explicit shape information. (TAE, MolconnZ)
An Alternative Approach: Molecular “Shape Signature” encoding
–
Zauhar, R. J. and W. J. Welsh (2000). Application of the "shape signatures" approach
to ligand- and receptor-based drug design. AM CHEM S 220, Washington DC.
–
P.G. Mezey, 1993. "Shape in Chemistry: An Introduction to Molecular Shape and
Topology", VCH Publishers, New York. (Molecular Shape Analysis - MSA)
Molecular Shape Encoding - “Shape Signatures”
Zauhar’s “Shape Signatures” provide encoded molecular shape
fingerprints based on internal reflection ray-tracing within a
molecular envelope.
– Ray length distribution
– Reflection Angle distribution
“Shape Signatures” have been used effectively by Zauhar for
molecular classification.
– Estrogenic compound identification (reported at 220th ACS Meeting
in Washington D.C.)
Can more be done?
– Coupling of TAE/RECON electronic property reconstruction with
“Shape Signature” technology
– Generation Shape-aware electronic QSAR Descriptors
Internal Molecular Ray Collision Detection
Begin with property-encoded TAE molecular surface
– Use 0.002 e/au3 electron density isosurface (104-105 elements)
‘Bounce’ rays throughout the volume of a molecule
– Random starting location and direction
– Ray collision detection technology
Pre-melding
PIP encoded isosurface
Post-melding
PIP encoded isosurface with portion removed
Internal Molecular Ray Collision Detection
Single ray collision with primary plane
Implementation: Surface Property-Encoded Ray Tracing
Begin with property-encoded TAE molecular surface
– Use 0.002 e/au3 electron density isosurface (104-105 elements)
‘Bounce’ rays throughout the volume of a molecule
– Random starting location and direction
– Ray collision detection technology
Determining point-of-incidence is difficult
– Use Binary-Space-Partitioning Tree
Implementation: Surface Property-Encoded Ray Tracing
BSP Tree - Graphical representation
Isosurface with division planes
Implementation: Surface Property-Encoded Ray Tracing
TAE Internal Ray Reflection - low resolution scan
Isosurface (portion removed) with 750 segments
Implementation: Surface Property-Encoded Ray Tracing
TAE Internal Ray Reflection - high resolution scan
Isosurface (portion removed) with 4000 segments
TAE Electron Density-Derived Properties
Scalar Properties
– Property Extrema
– Integral Average of Property
– Surface Histogram of a
Property
Scalarized Vector Properties
– Property Extrema
– Integral Average of Property
– Surface Histogram of a
Property
Types of Properties
–
–
–
–
Electrostatic Potential (EP)
Electronic Kinetic Energy Density
Electron Density Gradients (DRN)
Laplacian of the Electron Density
(LAPL)
– Local Average Ionization Potential
(PIP)
– Bare Nuclear Potential (BNP)
TAE-Derived Hybrid Shape-Property Distribution
Continuous distribution can provide a 6 x 6 descriptor grid
Continuous
6 x 6 Descriptor Grid
Shape-Aware Molecular Descriptors from
Property/Segment- Length Distributions
Segment length and point-of-incidence value form 2D-histogram
Each bin of 2D-histogram becomes a hybrid descriptor
– 36 descriptors per hybrid length-property
PIP vs Segment Length
HIV Reverse-Transcriptase Inhibitor Dataset
64 molecules with EC50 values in MT-4 cells
5 structural classes
O
X
HN
HN
N
O
R1
HO
N
N
S
S
N
O
O
R
TIBO class (13)
N
R
R1
HEPT class (26)
O
N
R2
Thiadiazole class (7)
R2
O
R1
R1
N
N
O
TBDMSO H2N
O
R2
OTBDMS
N
O
O
S
O
TSAO class (11)
N
N
TBDMSO H2N
O
OTBDMS
O
O
S
O
Triazoline class (7)
R Garg, S.P. Gupta, H. Gao, M.S. Babu, A.K. Debnath, and C. Hansch Chem. Rev., 1999, 99, 3525-3601
Shape/Property Distributions (PIP)
PIP vs Length Property Distribution - Five structural classes
Shape/Property Distributions (EP)
EP vs Length Property Distribution - Five structural classes
Shape/Property Distributions (BNP)
BNP vs Length Property Distribution - Five structural classes
Shape/Property Distributions (Laplacian)
Laplacian vs Length Property Distribution - Five structural
classes
Shape/Property Distributions (Rho Gradient)
Rho Gradient vs Length Property Distribution - Five structural
classes
Shape/Property Distributions (Bounce Angle)
Bounce Angle vs Length Property Distribution - Five structural
classes
HIV Reverse-Transcriptase Inhibitor modeling
SCATTERPLOT DATA
1
0.9
29
Predicted Response
0.8
0.7
0.6
28
42 40
46
3
20
0.5
0.4
14
0.3
0.2
23
15
38 27
17
162
26
21
18
47
19
22
1
0
0.1
0.1
0
0.2
0.3
11
52
5750
58
9 59
32
55
87
5637
25
545
44
53
48
62
6 414
60
4345 49
6461 63
39
24
0.4
12
13
51 10
33
30 35
31
36
34
q2 = 0.2394
Q2 = 0.2407
RMSE = 0.1182
0.5
0.6
0.7
Observed Response
0.8
0.9
1
Summary
Shape-aware TAE descriptors encode both molecular electronic
surface properties and their spatial distribution.
The TAE/RECON implementation of shape-encoded descriptors
has the potential for high-throughput screening and
classification.
Models built using the new descriptors produce better results
than those constructed from a combination of TAE, 3D and
topological (MolConnZ) descriptors.
Acknowledgements
Members of the DDASSL group
– Breneman Research Group
• With special acknowledgements to:
– Larry Lockwood
– Sukumar Nagamani
– Embrechts Research Group
– Bennett Research Group
NSF/KDI (IIS-9979860)
Chemical Computing Group (CCG)
– CCG Excellence Travel Award and MOE License
See: www.drugmining.com for more details.
Reserve Slides
Shape/Property Distributions (Fukui)
Fukui vs Length Property Distribution - Five structural classes
Shape/Property Distributions (Electronic KE K)
KE (K) vs Length Property Distribution - Five structural classes
Shape/Property Distributions (G Gradient)
G Gradient vs Length Property Distribution - Five structural
classes
Shape/Property Distributions (Electronic KE G)
KE (G) vs Length Property Distribution - Five structural classes
Shape/Property Distributions (K Gradient)
K Gradient vs Length Property Distribution - Five structural
classes