Calculating Molecular Properties

Download Report

Transcript Calculating Molecular Properties

Quantitative StructureActivity Relationships
(QSAR)
Rationale for QSAR Studies
 In drug design, in vitro potency addresses only
part of the need; a successful drug must also be
able to reach its target in the body while still in
its active form.
 The in vivo activity of a substance is a composite
of many factors, including the intrinsic
reactivity of the drug, its solubility in water, its
ability to pass the blood-brain barrier, its nonreactivity with non-target molecules that it may
encounter on its way to the target, and others.
Rationale for QSAR Studies...
 A quantitative structure-activity relationship
(QSAR) correlates measurable or calculable
physical or molecular properties to some specific
biological activity in terms of an equation.
 Once a valid QSAR has been determined, it
should be possible to predict the biological
activity of related drug candidates before they
are put through expensive and time-consuming
biological testing. In some cases, only computed
values need to be known to make an assessment.
History of QSAR
 The first application of QSAR is attributed to
Hansch (1969), who developed an equation that
related biological activity to certain electronic
characteristics and the hydrophobicity of a set
of related structures.
log (1/C) = k1log P - k2(log P)2 + k3s + k4
for:
C = minimum effective dose
P = octanol - water partition coefficient
s = Hammett substituent constant
kx= constants derived from regression analysis
Hansch’s Approach
 Log P is a measure of the drug’s hydrophobicity,
which was selected as a measure of its ability to
pass through cell membranes.
 The log P (or log Po/w) value reflects the relative
solubility of the drug in octanol (representing
the lipid bilayer of a cell membrane) and water
(the fluid within the cell and in blood).
 Log P values may be measured experimentally
or, more commonly, calculated.
Calculating Log P
Log P = Log K (o/w) = Log ([X]octanol/[X]water)
 most programs use a group additivity approach:
1 Aromatic ring
7 H’s on Carbon
1 C-Br bond
1 alkyl C
0.780
1.589
-0.120
0.195
CH2
Br
Sum = 2.924 = calc. log P
 some use more complicated algorithms, including
factors such as the dipole moment, molecular size
and shape.
Hansch’s Approach...
 The Hammett substituent constant (s) reflects the
drug molecule’s intrinsic reactivity, related to the
electronic factors of aromatic ring substituents.
 In chemical reactions, aromatic ring substituents
can alter the rate of reaction by up to 6 orders of
magnitude!
 For example, the rate of the reaction below is ~105
times slower when X = NO2 than when X = CH3

X
C
H
Cl
CH3OH

X
C OCH3 + HCl
H
Hammett Equation
 Hammett observed a linear free energy
relationship between the log of the relative rate
constants for ester hydrolysis and the log of the
relative acid ionization (equilibrium) constants
for a series of substituted benzoic esters & acids.
log (kx/kH) = rs (for rates)
or:
log (Kx/KH) = rs (for equilibria)
Hammett Equation, cont’d
 Hammett arbitrarily assigned a value of 1 to r,
the reaction constant, of the acid ionization of
benzoic acid. Thus,
log (Kx/KH) = rs = (1)s
or: log (Kx/KH) = s = log (Kx) - log (KH)
log (Kx) = s + log (KH)
 This allows substituent constants (s) for
various substituents to determined as the log
of the ratio of equilibrium constants for acid
ionization of substituted benzoic acids.
Definition of Hammett r
O
O
C
C
OH
O
+
substituent s p Eq. constant
-NH2
-0.66
0.00000554
-OCH3
-0.27
0.000015
-CH3
-0.17
0.000023
-H
0.00
0.000034
-Cl
0.23
0.000055
-COCH3
0.5
0.000088
-CN
0.66
0.000128
-NO2
0.78
0.000166
X
log K
-5.25649
-4.82391
-4.63827
-4.46852
-4.25964
-4.05552
-3.89279
-3.77989
Hammett Plot
Log K
X
H
-3.7
-3.9
-4.1
-4.3
-4.5
-4.7
-4.9
-5.1
-5.3
y = 0.9992x - 4.5305
R2 = 0.9907
-1
-0.5
0
0.5
1
sigma p
These sp values are obtained from the best fit line having a slope (r) = 1
Hammett Plot
 Aryl substituent constants (s) were determined by
measuring the effect of a substituent on an
equilibrium constant (Keq), or a reaction rate
constant (k). These are listed in tables, and are
constant in widely different reactions.
 Reaction constants (r) for other reactions may also be
determined by comparison of the relative rates (or
Keq) of two differently substituted reactants, using
the substituent constants described above.
 Some of these values (s and r) are listed on the
following slide.
Hammett Rho & Sigma Values
Reaction (Rho) Values r
O
O
CH2COCH3
OH
X
r = + 2.4
X

X
CH2CO + CH3OH
C
H
Cl
CH3OH

X
C OCH3 + HCl
r = - 5.0
H
Substituent (Sigma) Values s (the electronic effect of the substituent;
negative values are electron donating)
p-NH2
-0.66
p-Cl
0.23
p-OCH3
-0.27
p-COCH3
0.50
p-CH3
-0.17
p-CN
0.66
m-CH3
-0.07
p-NO2
0.78
Molecular Properties in QSAR
 Many other molecular properties have been
incorporated into QSAR studies; some of these
are measurable physical properties, such as:






density
 pKa
ionization energy
 boiling point
Hvaporization
 refractive index
molecular weight
 dipole moment (m)
Hhydration
 reduction potential
lipophilicity parameter
p = log PX - log PH
Molecular Properties in QSAR
 Other molecular properties (descriptors) that
have been incorporated into QSAR studies
include calculated properties, such as:







ovality
HOMO energy
polarizability
molecular volume
vdW surface area
molar refractivity
hydration energy
 surface area, molec. volume
 LUMO energy
 charges on individual atoms
 solvent accessible surface area
 maximum + and - charge
 hardness
 Taft’s steric parameter
QSAR Methodology
 Often it is found that several descriptors are
correlated; that is, they describe observables
that are closely related, such as MW and
boiling point in a homologous series.
 Statistical analysis is used to determine which
of the variables best describe (correlate with)
the observed biological activity, and which are
cross-correlated. The final QSAR involves
only the most important 3 to 5 descriptors,
eliminating those with high cross-correlation.
Limit to the # of Descriptors
 The data set should contain at least 5 times as
many compounds as descriptors in the QSAR.
 The reason for this is that too few compounds
relative to the number of descriptors will give a
falsely high correlation:



2 points exactly determine a line (2 comp’ds, 2 prop)
3 points exactly determine a plane (etc., etc.)
A data set of drug candidates that is similar in
size to the number of descriptors will give a high
(and meaningless) correlation.
Statistical Analysis of Data
 Multiple linear regression analysis can be
accomplished using standard statistical software,
typically incorporated into sophisticated (and
expensive) drug design software packages, such
as MSI’s Cerius2 (academic price, over $20K)
 An inexpensive statistical analysis software
StatMost (academic price, $39) works just fine.
 To discover correlated variables and determine
which descriptors correlate best, a partial least
squares or principal component analysis is done.
Example of a QSAR
Br
X
CH3
N
CH3
Y
Anti-adrenergic Activity and Physicochemical Properties
of 3,4- disubstituted N,N-dimethyl-a-bromophenethylamines
p
= Lipophilicity parameter
s+
= Hammett Sigma+ (for benzylic cations)
Es(meta) = Taft’s steric parameter
Example of a QSAR...
m-X
H
F
H
Cl
Cl
Br
I
Me
Br
H
Me
H
Cl
Br
Me
Cl
Me
H
H
Me
Br
Br
p-Y
H
H
F
H
F
H
H
H
F
Cl
F
Br
Cl
Cl
Cl
Br
Br
I
Me
Me
Br
Me
p
0.00
0.13
0.15
0.76
0.91
0.94
1.15
0.51
1.09
0.70
0.66
1.02
1.46
1.64
1.21
1.78
1.53
1.26
0.52
1.03
1.96
1.46
s+
0.00
0.35
-0.07
0.40
0.33
0.41
0.36
-0.07
0.34
0.11
-0.14
0.15
0.51
0.52
0.04
0.55
0.08
0.14
-0.31
-0.38
0.56
0.10
Es(meta)
1.24
0.78
1.24
0.27
0.27
0.08
-0.16
0.00
0.08
1.24
0.00
1.24
0.27
0.08
0.00
0.27
0.00
1.24
1.24
0.00
0.08
0.08
log (1/C)obs
7.46
7.52
8.16
8.16
8.19
8.30
8.40
8.46
8.57
8.68
8.82
8.89
8.89
8.92
8.96
9.00
9.22
9.25
9.30
9.30
9.35
9.52
Calc.
log (1/C)a
7.82
7.45
8.09
8.11
8.38
8.30
8.61
8.51
8.57
8.46
8.78
8.77
8.75
8.94
9.15
9.06
9.46
9.06
8.87
9.56
9.25
9.35
Calc.
log (1/C)b
7.88
7.43
8.17
8.05
8.34
8.22
8.51
8.36
8.51
8.60
8.65
8.94
8.77
8.94
9.08
9.11
9.43
9.26
8.98
9.47
9.29
9.33
Example of a QSAR...
QSAR Equation a: (using 2 variables)
log (1/C) = 1.151 p - 1.464 s + + 7.817
(n = 22; r = 0.945)
QSAR Equation b: (using 3 variables)
log (1/C) = 1.259 p - 1.460 s + + 0.208 Es(meta) + 7.619
(n = 22; r = 0.959)
Example of a QSAR...
m-X
H
F
H
Cl
Cl
Br
I
Me
Br
H
Me
H
Cl
Br
Me
Cl
Me
H
H
Me
Br
Br
p-Y
H
H
F
H
F
H
H
H
F
Cl
F
Br
Cl
Cl
Cl
Br
Br
I
Me
Me
Br
Me
p
0.00
0.13
0.15
0.76
0.91
0.94
1.15
0.51
1.09
0.70
0.66
1.02
1.46
1.64
1.21
1.78
1.53
1.26
0.52
1.03
1.96
1.46
s+
0.00
0.35
-0.07
0.40
0.33
0.41
0.36
-0.07
0.34
0.11
-0.14
0.15
0.51
0.52
0.04
0.55
0.08
0.14
-0.31
-0.38
0.56
0.10
Es(meta)
1.24
0.78
1.24
0.27
0.27
0.08
-0.16
0.00
0.08
1.24
0.00
1.24
0.27
0.08
0.00
0.27
0.00
1.24
1.24
0.00
0.08
0.08
log (1/C)obs
7.46
7.52
8.16
8.16
8.19
8.30
8.40
8.46
8.57
8.68
8.82
8.89
8.89
8.92
8.96
9.00
9.22
9.25
9.30
9.30
9.35
9.52
a
b
Calc.
log (1/C)a
7.82
7.45
8.09
8.11
8.38
8.30
8.61
8.51
8.57
8.46
8.78
8.77
8.75
8.94
9.15
9.06
9.46
9.06
8.87
9.56
9.25
9.35
Calc.
log (1/C)b
7.88
7.43
8.17
8.05
8.34
8.22
8.51
8.36
8.51
8.60
8.65
8.94
8.77
8.94
9.08
9.11
9.43
9.26
8.98
9.47
9.29
9.33
QSAR of Antifungal Neolignans
 The PM3 semi-empirical method was employed to
calculate a set of molecular properties (descriptors) of
18 neolignan compounds with activities against
Epidermophyton floccosum, a most susceptible species
of dermophytes. The correlation between biological
activity and structural properties was obtained by
using the multiple linear regression method. The QSAR
showed not only statistical significance but also
predictive ability. The significant molecular descriptors
related to the compounds with antifungal activity were:
hydration energy (HE) and the charge on C1' carbon
atom (Q1'). The model obtained was applied to a set of
10 new compounds derived from neolignans; five of
them presented promising biological activities against
E. floccosum.
Neolignans
Descriptors Used
 Log P: the values of this property were obtained from the
hydrophobic parameters of the substituents;
 superficial area (A) and molecular volume (V), log of the partition
coefficient (Log P), hydration energy (HE): properties evaluated
with the molecular modeling package HyperChem 5.0;
 partial atomic charges (Qn) and bond orders (Ln) derived from
the electrostatic potential;
 energy of the HOMO (H) and LUMO (L) frontier orbitals;
 hardness (h): obtained from the equation h =(ELUMO-EHOMO)/2;
 Mulliken electronegativity (c): calculated from the equation c =
-(EHOMO+ELUMO)/2;
 other electronic properties were calculated: total energy (ET),
heat of formation (DHf); ionization potential (IP), dipole moment
(m) and polarizability (POL), whose values were obtained from the
molecular orbital program Ampac 5.0.
Two Most Important Descriptors
Antifungal QSAR
Log 1/C = -2.85 - 0.38 HE - 1.45 Q1'
F=29.63, R2=0.86, Q2=0.80, SEP=0.
where:
F is the Fisher test for significance of the eq’n.
R2 is the general correlation coefficient,
Q2 is the predictive capability, and
SEP is the standard error of prediction.
A.A.C. Pinheiro, R.S. Borges, L.S. Santos, C.N. Alves,
Journal of Molecular Structure: THEOCHEM, Vol 672, pp 215-219 (2004).
QSAR-Calculated Antifungal Activity
New Neolignans
Log P vs. polarizable surface area
Bioavailable
log P
CNS active
polarizable surface area
(both values can be calculated)
Example of a Pharmacophore
2D Hypothesis and Alignment
3 Dimensional QSAR Methods
 Important regions of bioactive molecules are
“mapped” in 3D space, such that regions of
hydrophobicity, hydrophilicity, H-bonding
acceptor, H-bond donor, p-donor, etc. are
rendered so that they overlap, and a general
3D pattern of the functionally significant
regions of a drug are determined.
testosterone
 CoMFA (Comparative
Molecular Field Analysis)
is one such approach:
CoMFA of Testosterone
Blue means electronegative
groups enhance, red means
Electn’g. gr’ps reduce binding
Green means bulky groups
enhance, yellow means they
reduce binding