presentation source

Download Report

Transcript presentation source

Advances in Electronic Property-Encoded
Molecular Shape Descriptors
C. Matthew Sundling and Curt M. Breneman*
Department of Chemistry
Rensselaer Polytechnic Institute, Cogswell Lab 319A, 110 8th St., Troy,
NY 12180, Fax: 518 276-4887
{sundlm,brenec}@rpi.edu
A Progress Report from NSF Project DDASSL
(Drug Design and Semi-Supervised Learning)
(KDD/ IIS-9979860)
Project DDASSL –Drug Design And SemiSupervised Learning (NSF/KDD IIS-9979860)

Collaborative effort between three research groups in different
departments at RPI.

Goals include the rapid development of bioactivity screening
and QSAR models through data mining of large databases.

To achieve these goals, the DDASSL group is developing novel
descriptors, feature selection techniques and classification
methods.

Methods are currently being tested on smaller datasets to
determine their reliability before proceeding to larger datasets.
www.drugmining.com
The RECON/TAE Reconstruction Method: TAE
“Whole-Molecule” Descriptors

TAE Descriptors: Molecular surfaces and surface property
distributions without explicit shape information.
– C.M. Breneman and M. Rhem, J. Comp. Chem, Vol 18:2, 182-197 (1997)

Bader’s AIM approach is used to generate a large library of
structurally-distinct atom types.

A connectivity-based selection algorithm is used to assign atom
types to 2-D connection tables.

RECON molecular properties may be used to rapidly generate
Electron Density-Derived molecular TAE descriptors and
Wavelet Coefficient Descriptors (WCDs) for large datasets.
Electron Density-Derived TAE Descriptors
1 ) Surface properties are encoded on 0.002 e/au3 surface
Breneman, C.M. and Rhem, M., J. Comp. Chem., 1997,18(2), p. 182-197
2 ) Histograms or wavelet encoded of surface properties give
TAE property descriptors
Histograms
PIP (Local Ionization Potential)
Wavelet Coefficients
QSAR Modeling with 3D Electronic Property Descriptors

3D Molecular descriptors: To Align or not to Align?
– Pro: Alignment allows the capture of the spatial orientation of
electronic properties. (Pharmacophore information)
– Con: Alignment impedes automated 3D descriptor generation.

•
CoMFA alignment rules do not perform well in unsupervised
operations.
•
3D whole-molecule descriptors can be effective in QSAR/QSPR
modeling without utilizing explicit shape information. (TAE, MolconnZ)
An Alternative Approach: Molecular “Shape Signature” encoding
–
Zauhar, R. J. and W. J. Welsh (2000). Application of the "shape signatures" approach
to ligand- and receptor-based drug design. AM CHEM S 220, Washington DC.
–
P.G. Mezey, 1993. "Shape in Chemistry: An Introduction to Molecular Shape and
Topology", VCH Publishers, New York. (Molecular Shape Analysis - MSA)
Molecular Shape Encoding - “Shape Signatures”

Zauhar’s “Shape Signatures” provide encoded molecular shape
fingerprints based on internal reflection ray-tracing within a
molecular envelope.
– Ray length distribution
– Reflection Angle distribution

“Shape Signatures” have been used effectively by Zauhar for
molecular classification.
– Estrogenic compound identification (reported at 220th ACS Meeting
in Washington D.C.)

Can more be done?
– Coupling of TAE/RECON electronic property reconstruction with
“Shape Signature” technology
– Generation Shape-aware electronic QSAR Descriptors
Internal Molecular Ray Collision Detection

Begin with property-encoded TAE molecular surface
– Use 0.002 e/au3 electron density isosurface (104-105 elements)

‘Bounce’ rays throughout the volume of a molecule
– Random starting location and direction
– Ray collision detection technology
Pre-melding
PIP encoded isosurface
Post-melding
PIP encoded isosurface with portion removed
Internal Molecular Ray Collision Detection
Single ray collision with primary plane
Implementation: Surface Property-Encoded Ray Tracing

Begin with property-encoded TAE molecular surface
– Use 0.002 e/au3 electron density isosurface (104-105 elements)

‘Bounce’ rays throughout the volume of a molecule
– Random starting location and direction
– Ray collision detection technology

Determining point-of-incidence is difficult
– Use Binary-Space-Partitioning Tree
Implementation: Surface Property-Encoded Ray Tracing

BSP Tree - Graphical representation
Isosurface with division planes
Implementation: Surface Property-Encoded Ray Tracing

TAE Internal Ray Reflection - low resolution scan
Isosurface (portion removed) with 750 segments
Implementation: Surface Property-Encoded Ray Tracing

TAE Internal Ray Reflection - high resolution scan
Isosurface (portion removed) with 4000 segments
TAE Electron Density-Derived Properties

Scalar Properties
– Property Extrema
– Integral Average of Property
– Surface Histogram of a
Property

Scalarized Vector Properties
– Property Extrema
– Integral Average of Property
– Surface Histogram of a
Property

Types of Properties
–
–
–
–
Electrostatic Potential (EP)
Electronic Kinetic Energy Density
Electron Density Gradients (DRN)
Laplacian of the Electron Density
(LAPL)
– Local Average Ionization Potential
(PIP)
– Bare Nuclear Potential (BNP)
TAE-Derived Hybrid Shape-Property Distribution

Continuous distribution can provide a 6 x 6 descriptor grid
Continuous
6 x 6 Descriptor Grid
Shape-Aware Molecular Descriptors from
Property/Segment- Length Distributions


Segment length and point-of-incidence value form 2D-histogram
Each bin of 2D-histogram becomes a hybrid descriptor
– 36 descriptors per hybrid length-property
PIP vs Segment Length
HIV Reverse-Transcriptase Inhibitor Dataset


64 molecules with EC50 values in MT-4 cells
5 structural classes
O
X
HN
HN
N
O
R1
HO
N
N
S
S
N
O
O
R
TIBO class (13)
N
R
R1
HEPT class (26)
O
N
R2
Thiadiazole class (7)
R2
O
R1
R1
N
N
O
TBDMSO H2N
O
R2
OTBDMS
N
O
O
S
O
TSAO class (11)
N
N
TBDMSO H2N
O
OTBDMS
O
O
S
O
Triazoline class (7)
R Garg, S.P. Gupta, H. Gao, M.S. Babu, A.K. Debnath, and C. Hansch Chem. Rev., 1999, 99, 3525-3601
Shape/Property Distributions (PIP)

PIP vs Length Property Distribution - Five structural classes
Shape/Property Distributions (EP)

EP vs Length Property Distribution - Five structural classes
Shape/Property Distributions (BNP)

BNP vs Length Property Distribution - Five structural classes
Shape/Property Distributions (Laplacian)

Laplacian vs Length Property Distribution - Five structural
classes
Shape/Property Distributions (Rho Gradient)

Rho Gradient vs Length Property Distribution - Five structural
classes
Shape/Property Distributions (Bounce Angle)

Bounce Angle vs Length Property Distribution - Five structural
classes
HIV Reverse-Transcriptase Inhibitor modeling
SCATTERPLOT DATA
1
0.9
29
Predicted Response
0.8
0.7
0.6
28
42 40
46
3
20
0.5
0.4
14
0.3
0.2
23
15
38 27
17
162
26
21
18
47
19
22
1
0
0.1
0.1
0
0.2
0.3
11
52
5750
58
9 59
32
55
87
5637
25
545
44
53
48
62
6 414
60
4345 49
6461 63
39
24
0.4
12
13
51 10
33
30 35
31
36
34
q2 = 0.2394
Q2 = 0.2407
RMSE = 0.1182
0.5
0.6
0.7
Observed Response
0.8
0.9
1
Summary

Shape-aware TAE descriptors encode both molecular electronic
surface properties and their spatial distribution.

The TAE/RECON implementation of shape-encoded descriptors
has the potential for high-throughput screening and
classification.

Models built using the new descriptors produce better results
than those constructed from a combination of TAE, 3D and
topological (MolConnZ) descriptors.
Acknowledgements

Members of the DDASSL group
– Breneman Research Group
• With special acknowledgements to:
– Larry Lockwood
– Sukumar Nagamani
– Embrechts Research Group
– Bennett Research Group

NSF/KDI (IIS-9979860)

Chemical Computing Group (CCG)
– CCG Excellence Travel Award and MOE License

See: www.drugmining.com for more details.
Reserve Slides
Shape/Property Distributions (Fukui)

Fukui vs Length Property Distribution - Five structural classes
Shape/Property Distributions (Electronic KE K)

KE (K) vs Length Property Distribution - Five structural classes
Shape/Property Distributions (G Gradient)

G Gradient vs Length Property Distribution - Five structural
classes
Shape/Property Distributions (Electronic KE G)

KE (G) vs Length Property Distribution - Five structural classes
Shape/Property Distributions (K Gradient)

K Gradient vs Length Property Distribution - Five structural
classes