Transcript QSAR
QSAR
Qualitative Structure-Activity Relationships
Can one predict activity (or properties in
QSPR) simply on the basis of knowledge of
the structure of the molecule?
In other, words, if one systematically changes
a component, will it have a systematic effect
on the activity?
Choice of Model
Can approach in two directions:
Simple to complex model
Complex to simple model
Simplest Model
Linear relationship between x and y
Y = mx + b
Minimize error by least squares:
S(Yi – Y’i)2 = S[Yi – (mXi + b)]2
Y’i is predicted value
Least Squares
Correlation coefficient
-1 < r < 1
Another test
Is the line better than the mean?
60
y = 2.9562x - 0.2597
2
y = 0.0676x - 0.3882
R = 0.8686
2
R = 0.0045
30
0
-15
-10
-5
0
5
10
15
-10
-5
0
-30
-15
A circle
-60
2 lines
5
10
15
100
1000
y = 0.0008x + 275.11
y = 2.8515x - 31.647
R2 = 0.978
2
R = 0.9179
75
750
50
500
25
250
0
0
10
20
30
40
One bad point
50
0
200000
400000
600000
Wrong model
800000
Multiple Regression
Y = f (X1, X2…Xn)
Problems:
Choice of model – linear, polynomial, etc.
Visualization
Interpretation
Computationally demanding
Variable reduction
Principal Component Analysis
Principal Component
PC1 = a1,1x1 + a1,2x2 + … + a1,nxn
PC2 = a2,1x1 + a2,2x2 + … + a2,nxn
Keep only those components that
possess largest variation
PC are orthogonal to each other
Exploring QSAR
Pickup the NONLIN program
http://www.trinity.edu/sbachrac/drugdesign2007/
Unzip and install it on your computer
Read the Read.Me and Nonlin.doc
documentation
Look at the HeatForm.NLR file with any
word processor
Running NONLIN
Start an MSDOS window
Change to directory where the code is
Cd /d d:\nonlin
Execute the program with data file
Nonlin heatForm > output
assignment
Propose a QSAR scheme to predict the
DHf of the alkanes
Early Examples
Hammett (1930s-1940s)
COOH
X
COOH
X
COOH
X
X
para = log10
Kp
K0
meta = log10 Km
K0
COO + H
K0
COO + H
Kp
COO + H
Km
Hammett (cont.)
Now suppose have a related series
X
CH2COOH
CH2COO
X
+H
K'x
log10 K'x = r
K'0
reflect sensitivity to substituent
r reflect sensitivity to different system
Hammett (cont.)
Linear Free Energy Relationship
So
and
DG = -2.303RTlog10K
DG – DG0 = -2.303RT
DG’ – DG’0 = -2.303RTr
Therefore
DG’ – DG’0 = r(DG – DG0)
Free-Wilson Analysis
Log 1/C = S ai + m
where C=predicted activity,
ai= contribution per group, and
m=activity of reference
Free-Wilson example
Br
X
N
Y
HCl
activity of analogs
Log 1/C = -0.30 [m-F] + 0.21 [m-Cl] + 0.43 [m-Br]
+ 0.58 [m-I] + 0.45 [m-Me] + 0.34 [p-F] + 0.77 [p-Cl]
+ 1.02 [p-Br] + 1.43 [p-I] + 1.26 [p-Me] + 7.82
Problems include at least two substituent position
necessary and only predict new combinations of the
substituents used in the analysis.
Hansch Analysis
Log 1/C = a p + b + c
where
p(x) = log PRX – log PRH
and log P is the water/octanol partition
This is also a linear free energy relation
Molecular Descriptors
Simple rules for describing some aspect of a molecule
Structure
Property
2D descriptors only use the atoms and connection
information of the molecule
Internal 3D descriptors use 3D coordinate
information about each molecule; however, they are
invariant to rotations and translations of the
conformation
External 3D descriptors also use 3D coordinate
information but also require an absolute frame of
reference (e.g., molecules docked into the same
receptor).
Descriptor examples
Physical Properties
MW
log P (ocanol/water partition)
bp, mp
Dipole moment
solubility
Descriptor examples
Structural descriptors
2D
Atom/Bond counts
Number non-H atoms
Number of rotatable bonds
Number of each functional group
2C chains, 3C chains, 4C chains, 5C chains, etc.
Rings and their size
3D
Number of accessible conformations
Surface area
Topological Descriptors
Weiner Path Index
Distance Matrix
6
4
2
1
3
5
7
w = S S dij
i j>i
0123423
1012312
2101221
3210132
1234043
2123403
3212330
w = 46
Topological Descriptors
Randic Index
1
valence
at vertex
2
3
1
1
3
1
bond values
as product
of above
3
3
9
2
6
3
edge term
as reciprocal of
square rooot of
above bond values
.577
.577
.333
.408
.577
Sum of
edge terms
3.179
.707
Predict bp of alkanes
100
y = 1.5225x + 7.2917
R2 = 0.9547
90
bp
80
70
60
50
30
35
40
45
50
Weiner Index
55
60
65
3D Molecular Descriptors
Potential energy
Solvation energy
Water accessible surface area
Water accessible surface area of all
atoms with positive (negative) partial
charge
Pharmacophore
Specification of the spatial arrangement
of a small number of atoms or
functional groups
With the model in hand, search
databases for molecules that fit this
spatial environment
Creating a Pharmacophore
O
O
O
OH
O
OH
3D Pharmacophore searching
With the pharmacophore in hand,
search databases containing 3-D
structure of molecules for molecules
that fit
Can rank these “hits” using scoring
system described later
Pharmacophore Descriptors
Number of acidic atoms
Number of basic atoms
Number of hydrogen bond donor atoms
Number of hydrophobic atoms
Sum of VDW surface areas of hydrophobic atoms
Lipinski’s Rule of 5
potential drug candidates should
Have 5 or fewer H-bond donors (expressed as the
sum of OHs and NHs)
Have a MW <500
LogP less than 5
Have 10 or less H-bond acceptors (expressed as
the sum of Ns and Os)
Adv. Drug Delivery Rev., 1997, 23, 3
Docking
Interact a ligand with a receptor
Need to do the following
A) select appropriate ligands
B) select appropriate conformation of receptor
C) select appropriate conformations of ligands
D) combine the ligand and receptor (docking)
E) evaluate these combinations and rank order
them
Selection of Ligands
Want drug-like molecules
250< MW < 500
Lipinski’s rules
Search through databases
Available Chemicals Directory (ACD)
World Drug Index
NCI Drug database
In-house databases
Receptor Conformation
Usually Receptor is assumed to be static
Get structure from X-ray or NMR
experiment
Protein Data Bank (http://www.rcsb.org/pdb/)
41385 Structures
Ligand Conformation
Rigid or flexible
If rigid, optimize the structure then use it
throughout the docking procedure
If flexible, can
A) create a set of low energy conformations and
then use this set as a collection of rigid structures
in docking
B) optimize structure within active site of receptor,
i.e. dock and optimize together
Docking
Place ligand in appropriate location for
interacting with the receptor
Methodological problem:
1) No best method for defining shape
2) No general solution for packing irregular
objects (the knapsack problem)
Docking Algorithmic
Components
Receptor and Ligand Description
relative errors of structures, etc.)
(keep in mind
Bind the Ligand to Receptor
(configuration/conformation search)
Geometric search (match ligand and receptor site
descriptions)
Search for minimum energy - molecular dynamics
(MD) or monte carlo (MC)
Evaluation of the dock (DGbind) also called
scoring
Descriptor Matching Method
DOCK program
1) Generate molecular surface for receptor
2) Generate spheres to fill the active site
(usually 30-50 spheres)
3) Match sphere centers to the ligand atoms
(originally just lowest E conformer, now use multiple
conformers, but still rigid) – generates 10K orientations per
ligand – Shape-driven!
4) Score the interaction
Fragment-Joining Method
FlexX, LUDI
Place base fragments into microstates
of the active site (Fragments can be small
molecules like benzene, formaldehyde,
formamide, naphthol, etc.)
Optimize position of the Base fragment
Join fragments with small connecting
chains made of CH2, CO, CONH, etc.
Scoring (evaluation of the dock)
Want to quickly evaluate the strength of
the interaction between ligand and
receptor
Full free energy computation
Expensive
Requires excellent force fields
Empirical method
Fast and cheap
Requires fitting to a broad set of ligand/receptor
complexes
Empirical Scoring
Method of Bohm (LUDI, FlexX, etc.)
DGbind = DG0 + Sh-bonds DGhb f(DR,Da) + Sion DGion f(DR,Da)
+ DGlipo Alipo + DGrot NROT
DG0 reduction in binding energy due to loss of
rotation and translation of ligand
DGhb contribution from ideal hydrogen bond
DGion contribution from ionic interactions
DGlipo contribution from lipophilic interactions
DGrot contribution from freezing rotations within ligand
These come from empirical fits.
Bohm Method (cont.)
f(DR,Da) are penalty functions for non-ideal
interactions – distances too short/long, angles
not linear
f (DR,Da) = f1(DR)f2(Da)
f1(DR) = 1, DR<0.2 Å
f2(Da) = 1, Da<30°
1-(DR-0.2)/0.4, DR<0.6 Å
1-(Da-30)/50, Da<80°
0, DR>0.6 Å
0, Da>80°
DR is deviation from ideal H...O/N distance of 1.9 Å
Da is deviation from ideal N/O-H…O/N angle of 180°
Bohm Method (cont.)
Alipo is the lipophilic contact surface,
evaluated by a coarse grid of boxes
NROT is the number of rotatable bonds
– acyclic sp3-sp3, sp3-sp2 and sp2-sp2.
No terminal groups or flexibility of rings
incorporated.
H.-J. Bohm, J. Comput.-Aided Mol. Des., 1994, 8, 243-256
Scoring alternatives
Many variations on Bohm scheme
Buried Polar term, desolvation term, different
forms for the lipophilic term, include metal
bonding, etc.
Combine scoring functions, i.e. QSAR with
scoring functions as variables
Use empirical score to select set of hits, then
refine with free energy minimization