Transcript QSAR

QSAR



Qualitative Structure-Activity Relationships
Can one predict activity (or properties in
QSPR) simply on the basis of knowledge of
the structure of the molecule?
In other, words, if one systematically changes
a component, will it have a systematic effect
on the activity?
Choice of Model

Can approach in two directions:


Simple to complex model
Complex to simple model
Simplest Model

Linear relationship between x and y


Y = mx + b
Minimize error by least squares:
S(Yi – Y’i)2 = S[Yi – (mXi + b)]2
Y’i is predicted value
Least Squares
Correlation coefficient
-1 < r < 1
Another test
Is the line better than the mean?
60
y = 2.9562x - 0.2597
2
y = 0.0676x - 0.3882
R = 0.8686
2
R = 0.0045
30
0
-15
-10
-5
0
5
10
15
-10
-5
0
-30
-15
A circle
-60
2 lines
5
10
15
100
1000
y = 0.0008x + 275.11
y = 2.8515x - 31.647
R2 = 0.978
2
R = 0.9179
75
750
50
500
25
250
0
0
10
20
30
40
One bad point
50
0
200000
400000
600000
Wrong model
800000
Multiple Regression


Y = f (X1, X2…Xn)
Problems:




Choice of model – linear, polynomial, etc.
Visualization
Interpretation
Computationally demanding
Variable reduction

Principal Component Analysis
Principal Component




PC1 = a1,1x1 + a1,2x2 + … + a1,nxn
PC2 = a2,1x1 + a2,2x2 + … + a2,nxn
Keep only those components that
possess largest variation
PC are orthogonal to each other
Exploring QSAR

Pickup the NONLIN program




http://www.trinity.edu/sbachrac/drugdesign2007/
Unzip and install it on your computer
Read the Read.Me and Nonlin.doc
documentation
Look at the HeatForm.NLR file with any
word processor
Running NONLIN


Start an MSDOS window
Change to directory where the code is


Cd /d d:\nonlin
Execute the program with data file

Nonlin heatForm > output
assignment

Propose a QSAR scheme to predict the
DHf of the alkanes
Early Examples

Hammett (1930s-1940s)
COOH
X
COOH
X
COOH
X
X
para = log10
Kp
K0
meta = log10 Km
K0
COO + H
K0
COO + H
Kp
COO + H
Km
Hammett (cont.)

Now suppose have a related series
X
CH2COOH
CH2COO
X
+H
K'x
log10 K'x = r
K'0
 reflect sensitivity to substituent
r reflect sensitivity to different system
Hammett (cont.)

Linear Free Energy Relationship
So
and
DG = -2.303RTlog10K
DG – DG0 = -2.303RT
DG’ – DG’0 = -2.303RTr
Therefore
DG’ – DG’0 = r(DG – DG0)
Free-Wilson Analysis

Log 1/C = S ai + m
where C=predicted activity,
ai= contribution per group, and
m=activity of reference
Free-Wilson example
Br
X
N
Y
HCl
activity of analogs
Log 1/C = -0.30 [m-F] + 0.21 [m-Cl] + 0.43 [m-Br]
+ 0.58 [m-I] + 0.45 [m-Me] + 0.34 [p-F] + 0.77 [p-Cl]
+ 1.02 [p-Br] + 1.43 [p-I] + 1.26 [p-Me] + 7.82
Problems include at least two substituent position
necessary and only predict new combinations of the
substituents used in the analysis.
Hansch Analysis
Log 1/C = a p + b  + c
where
p(x) = log PRX – log PRH
and log P is the water/octanol partition
This is also a linear free energy relation
Molecular Descriptors

Simple rules for describing some aspect of a molecule





Structure
Property
2D descriptors only use the atoms and connection
information of the molecule
Internal 3D descriptors use 3D coordinate
information about each molecule; however, they are
invariant to rotations and translations of the
conformation
External 3D descriptors also use 3D coordinate
information but also require an absolute frame of
reference (e.g., molecules docked into the same
receptor).
Descriptor examples

Physical Properties





MW
log P (ocanol/water partition)
bp, mp
Dipole moment
solubility
Descriptor examples

Structural descriptors

2D

Atom/Bond counts






Number non-H atoms
Number of rotatable bonds
Number of each functional group
2C chains, 3C chains, 4C chains, 5C chains, etc.
Rings and their size
3D


Number of accessible conformations
Surface area
Topological Descriptors

Weiner Path Index
Distance Matrix
6
4
2
1
3
5
7
w = S S dij
i j>i
0123423
1012312
2101221
3210132
1234043
2123403
3212330
w = 46
Topological Descriptors

Randic Index
1
valence
at vertex
2
3
1
1
3
1
bond values
as product
of above
3
3
9
2
6
3
edge term
as reciprocal of
square rooot of
above bond values
.577
.577
.333
.408
.577
Sum of
edge terms
3.179
.707
Predict bp of alkanes
100
y = 1.5225x + 7.2917
R2 = 0.9547
90
bp
80
70
60
50
30
35
40
45
50
Weiner Index
55
60
65
3D Molecular Descriptors




Potential energy
Solvation energy
Water accessible surface area
Water accessible surface area of all
atoms with positive (negative) partial
charge
Pharmacophore


Specification of the spatial arrangement
of a small number of atoms or
functional groups
With the model in hand, search
databases for molecules that fit this
spatial environment
Creating a Pharmacophore
O
O
O
OH
O
OH
3D Pharmacophore searching


With the pharmacophore in hand,
search databases containing 3-D
structure of molecules for molecules
that fit
Can rank these “hits” using scoring
system described later
Pharmacophore Descriptors





Number of acidic atoms
Number of basic atoms
Number of hydrogen bond donor atoms
Number of hydrophobic atoms
Sum of VDW surface areas of hydrophobic atoms
Lipinski’s Rule of 5

potential drug candidates should




Have 5 or fewer H-bond donors (expressed as the
sum of OHs and NHs)
Have a MW <500
LogP less than 5
Have 10 or less H-bond acceptors (expressed as
the sum of Ns and Os)
Adv. Drug Delivery Rev., 1997, 23, 3
Docking


Interact a ligand with a receptor
Need to do the following





A) select appropriate ligands
B) select appropriate conformation of receptor
C) select appropriate conformations of ligands
D) combine the ligand and receptor (docking)
E) evaluate these combinations and rank order
them
Selection of Ligands

Want drug-like molecules



250< MW < 500
Lipinski’s rules
Search through databases




Available Chemicals Directory (ACD)
World Drug Index
NCI Drug database
In-house databases
Receptor Conformation


Usually Receptor is assumed to be static
Get structure from X-ray or NMR
experiment

Protein Data Bank (http://www.rcsb.org/pdb/)
41385 Structures
Ligand Conformation



Rigid or flexible
If rigid, optimize the structure then use it
throughout the docking procedure
If flexible, can


A) create a set of low energy conformations and
then use this set as a collection of rigid structures
in docking
B) optimize structure within active site of receptor,
i.e. dock and optimize together
Docking


Place ligand in appropriate location for
interacting with the receptor
Methodological problem:


1) No best method for defining shape
2) No general solution for packing irregular
objects (the knapsack problem)
Docking Algorithmic
Components

Receptor and Ligand Description
relative errors of structures, etc.)

(keep in mind
Bind the Ligand to Receptor
(configuration/conformation search)



Geometric search (match ligand and receptor site
descriptions)
Search for minimum energy - molecular dynamics
(MD) or monte carlo (MC)
Evaluation of the dock (DGbind) also called
scoring
Descriptor Matching Method
DOCK program
 1) Generate molecular surface for receptor

2) Generate spheres to fill the active site
(usually 30-50 spheres)

3) Match sphere centers to the ligand atoms
(originally just lowest E conformer, now use multiple
conformers, but still rigid) – generates 10K orientations per
ligand – Shape-driven!

4) Score the interaction
Fragment-Joining Method
FlexX, LUDI
 Place base fragments into microstates
of the active site (Fragments can be small
molecules like benzene, formaldehyde,
formamide, naphthol, etc.)


Optimize position of the Base fragment
Join fragments with small connecting
chains made of CH2, CO, CONH, etc.
Scoring (evaluation of the dock)

Want to quickly evaluate the strength of
the interaction between ligand and
receptor

Full free energy computation



Expensive
Requires excellent force fields
Empirical method


Fast and cheap
Requires fitting to a broad set of ligand/receptor
complexes
Empirical Scoring

Method of Bohm (LUDI, FlexX, etc.)
DGbind = DG0 + Sh-bonds DGhb f(DR,Da) + Sion DGion f(DR,Da)
+ DGlipo Alipo + DGrot NROT
DG0 reduction in binding energy due to loss of
rotation and translation of ligand
DGhb contribution from ideal hydrogen bond
DGion contribution from ionic interactions
DGlipo contribution from lipophilic interactions
DGrot contribution from freezing rotations within ligand
These come from empirical fits.
Bohm Method (cont.)

f(DR,Da) are penalty functions for non-ideal
interactions – distances too short/long, angles
not linear
f (DR,Da) = f1(DR)f2(Da)
f1(DR) = 1, DR<0.2 Å
f2(Da) = 1, Da<30°
1-(DR-0.2)/0.4, DR<0.6 Å
1-(Da-30)/50, Da<80°
0, DR>0.6 Å
0, Da>80°
DR is deviation from ideal H...O/N distance of 1.9 Å
Da is deviation from ideal N/O-H…O/N angle of 180°
Bohm Method (cont.)


Alipo is the lipophilic contact surface,
evaluated by a coarse grid of boxes
NROT is the number of rotatable bonds
– acyclic sp3-sp3, sp3-sp2 and sp2-sp2.
No terminal groups or flexibility of rings
incorporated.
H.-J. Bohm, J. Comput.-Aided Mol. Des., 1994, 8, 243-256
Scoring alternatives

Many variations on Bohm scheme



Buried Polar term, desolvation term, different
forms for the lipophilic term, include metal
bonding, etc.
Combine scoring functions, i.e. QSAR with
scoring functions as variables
Use empirical score to select set of hits, then
refine with free energy minimization