a medicinal chemists view of diversity - UK-QSAR
Download
Report
Transcript a medicinal chemists view of diversity - UK-QSAR
medicinal chemistry design
challenges
chemically intelligent data mining
multiparameter optimisation for medicinal chemists
how to handle petabytes of data – Google Chemistry!
activity prediction
Dr Tony Wood
VP, Head of Worldwide Medicinal Chemistry
Pfizer Global Research and Development
[email protected]
the challenge for design?
2002-2005 the primary causes of
attrition were safety and
pharmacology
in vivo toxicity
results of an analysis of 349 studies on 315 compounds covering 90 targets
at 985 doses with >10,000 organ evaluations in 4 species
PK known for all cases with strong correlation between AUC and Cmax
compound set has similar diversity to Pfizer file
High Concentration
Toxic Exposures
CmaxLowTox
Minimum exposure with observed toxicity
(CmaxLowTox). Set to an arbitrarily high number if
no toxic event is observed at any dose
Uncertain
CmaxHiCln
Maximum exposure without observed
toxicity (CmaxHiCln). Set to zero if toxicity
was observed at all doses assessed
Clean Exposures
Low Concentration
toxicity threshold selection
300
exposure thresholds were chosen to
obtain a balance of toxicity/non-toxicity.
set to 10uM for the total-drug threshold.
approx 40% of evaluations above
threshold & 40% below.
250
200
Clean
Uncertain
Toxic
150
100
similar analysis for free drug levels gives
a threshold of 1 uM.
50
0
100nM
1uM
10uM
100uM 1000uM
Threshold (Total Drug)
TPSA and clogP are key
the y-axis here is a generalized odds, i.e., the ratio of the probability of a compound
with a given parameter value being toxic to the probability of it not being toxic
toxicity odds
combining low TPSA and high cLogP exacerbates the risk
(numbers in parentheses indicate number of outcomes in database)
holds for both free-drug or total-drug thresholds
ratio of toxic to non-toxic outcomes
Total-Drug
TPSA>75
TPSA<75
Free-Drug
TPSA>75
TPSA<75
ClogP<3
0.39 (57)
1.08 (27)
ClogP<3
0.38 (44)
0.5 (27)
ClogP>3
0.41 (38)
2.4 (85)
ClogP>3
0.81 (29)
2.59 (61)
toxicity and promiscuity
ratio of promiscuous to nonpromiscuous compounds
TPSA>75
TPSA<75
ClogP<3
0.25 (25)
0.80 (18)
ClogP>3
0.44 (13)
6.25 (29)
promiscuity defined as >50%
activity in >2 Bioprint assay out
of a set of 48 (selected for data
coverage only)
clogP and organ toxicity
does a good cell viability profile increase the probability of a compound
being a CNS CAN w/o organ tox in the clogp risky group (clogp>3)?
20
23
15%
39%
39%
ClogP > 3
25%
THLE Cv bin
x < 25 uM
25 < x < 100 uM
x > 100 uM
60%
22%
2
22
5% 14%
ClogP < 3
50%
50%
82%
Organ Tox
No Organ Tox
organ tox or not
attrition CNS
CANs set
DEREK
“a place to store toxicological knowledge”
knowledge-based expert system
broad range of toxicity endpoints covered
identifies structural alert
provides literature-based rationale for prediction
qualitative or semi-quantitative predictions
now has an API for integration into 3rd party software
products
0
alpha-2-mu-Globulin nephropathy
Anaphylaxis
Bladder urothelial hyperplasia
Carcinogenicity
Cardiotoxicity
Cerebral oedema
Chloracne
Cholinesterase inhibition
Chromosome damage
Cumulative effect on white cell count
Cyanide-type effects
Developmental toxicity
Genotoxicity
Hepatotoxicity
HERG channel inhibition
High acute toxicity
Irritation (of the eye)
Irritation (of the gastrointestinal tract)
Irritation (of the respiratory tract)
Irritation (of the skin)
Lachrymation
Methaemoglobinaemia
Mutagenicity
Nephrotoxicity
Neurotoxicity
Occupational asthma
Ocular toxicity
Oestrogenicity
Peroxisome proliferation
Phospholipidosis
Photoallergenicity
Photocarcinogenicity
Photogenotoxicity
Photo-induced chromosome damage
Photomutagenicity
Phototoxicity
Pulmonary toxicity
Respiratory sensitisation
Skin sensitisation
Teratogenicity
Testicular toxicity
Thyroid toxicity
Uncoupler of oxidative phosphorylation
No.of active alerts
what’s in DEREK?
main strengths are mutagenicity, chromosome damage,
carcinogenicity and skin sensitization
some recent efforts in hepatotoxicity and teratogenicity
100
90
80
70
60
50
40
30
20
10
Endpoint
challenge #1
these relationships were determined using
a small well characterised data set
much more data lies in non curated data
sets with no structure keys
we need chemically intelligent data mining
to derive knowledge including SAR from
this resource
properties of CNS drugs
LE
90% ≥ 0.36
l ipE
95% Range
C LOGP
LOGD _7.4
Drugs
CANs
.2 .3 .4 .5 .6 .7 .8 .9 1 1.1 1.2
Median
-1 0 1 2 3 4 5 6 7 8 9 10 11 12
Drugs: 0.52
CANs: 0.47
6.2
6.3
LLE
TPSA
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8
ClogP
MW
2.9
3.4
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8
ClogD
H BD ON
1.8
2.3
B ASI C1PK A
Drugs
CANs
0
Median
100
Drugs: 47
CANs: 53
200
100
200
300
400
MW
500
305
360
600
700
0
1
2
HBD
3
4
1.0
1.0
5
6
1 2 3 4 5 6 7 8 9 10 11 12 13
pKa
8.4
8.4
CNS MPO summary
for design (prospective, accurate and constant)
increasing CNS MPO enhances the probability of candidate survival and
alignment of in-vitro
1
0.8
0.6
ClogP
↑CNS MPO increases
0.4
0.2
C,P,S
0
-2
-1
0
1
2
3
4
5
6
cLogP
the probability of
successfully aligning
attributes
1
0.8
0.6
TPSA
P,S
Permeability
0.4
0.2
0
0
20
40
60
80
100
120
PSA
(including efflux)
1
ClogD
MW
0.8
0.6
C,S
0.4
CNS MPO
Desirability
0.2
0
-2
Clearance
P
-1
0
1
2
3
4
5
LOGD7.4
1
0.8
0.6
0.4
0.2
124
0
100
HBD
P
200
Safety
(including high risk
space)
400
500
600
26%
75
44%
1
0.8
0.6
56%
0.4
0.2
83%
0
0
pKa
300
MW
1
2
3
4
5
HBDONORCNT
P,S
74%
Drugs
1
0.8
D rug s
0.6
CANs
CA N
P re
Binned Binned Drug_or_C
0.4
0.2
0
2
4
6
pKa (1)
8
10
challenge #2
design is now based on a probabilistic
basis using complex MPO relationships
we need transparent easy to construct and
understand methods to perform
multiparameter optimisation
chemoinformatic predictions
Pfizer
in house
Inpharmatica
StARLITe
Serine proteases
Cysteine proteases
Ion Channels
Kinases
unified db
4.8 M structures
275k active compounds
600k activities (IC50, etc)
3k targets
800 human targets
Thomson
IDDB
GPCRs (others:
classes A, B & C)
Aspartyl proteases
Phosphodiesterases
Aminergic GPCRs
Peptide GPCRs
Metalloproteases
Nuclear hormone
receptors
Miscellaneous
Enzymes
(hydrolases, transferases,
oxidoreductases & others)
Cerep
BioPrint
node : target
edge : compound
Bayesian learning
Rev Thomas Bayes
data set (assay data)
“good”
actives
“bad”
inactives
fingerprint bits ~
substructures
Bayesian model
ca 1702 - 1761
fingerprints are calculated for each
molecule
check how often fingerprint bit is
observed and how often in “good”
compound
assign weighting factor taking into
account both activity ratio and sampling
size
for instance: “good”/total ratio of
90/100 is statistically more relevant
than 9/10
model distinguishes “good” from “bad”
predict likelihood molecule is “good
mining large data sets (HTS)
confirmed measurement
0.45
0.40
LE HTA+
0.35
HTA+ > HTA
false positive HTA+ or
false negative HTA
predictions
all false negative HTA
colored by Bayesian score
red: high confidence
blue: low confidence
0.30
0.25
HTA > HTA+:
false positive HTA or
false Negative HTA+
predictions
red: false HTA+ negative
blue: false HTA positive
0.20
0.15
0.10
0.05
0.05
0.10
0.15
0.20
0.25
LE HTA
0.30
0.35
0.40
0.45
predicting promiscuity
238k
actives
( 10 M) human target
mw < 1000
pass reactivity filter
10 actives / target
3870 compounds with
10,806 predictions
90% / 214k
FCFP_6
698 models
Bayesian score
searching virtual space
BIG LEAP: searching the Pfizer liquid and virtual compound collections
real: 0.000025%
1.2M singletons
derive Bayesian model
that distinguishes library
1 from 2, from 3, etc
Pfizer global virtual library
~ 1012 compounds
liquid screening file
5000 libraries
2.5M compounds
predict 16 libraries
to which compound
could belong
search only these
libraries, in real and
virtual compound
space
BIG LEAP
Acids
O
O
A1
N
Cl
x
N
N
B1
N
?
Amines
x
x
o
B2
1
2
O
N
*
N
O
A2
1
CF3
N
O
B4
model is built from synthesized compounds (yellow squares)
nearly all fingerprint features of any virtual compound (square marked with “?”) are shared by at
least one compound from the training set (squares marked with “X”)
virtual products in areas 1 share at least one monomer with a compound from the training set-for
compound “O”, the new monomer B2 is very close to previously used B1
compounds from area 2 can be considered outside the scope of the model because they have few
fingerprint features in common with the existing products as shown for compound “*” where
monomers A2 and B4 are unlike previously used monomers
a new series for PRA
acidic
ex-PR
library
ex-PR
library
ex-PR
library
ex-PR
library
new
CCT services
a framework for computational scientists to publish services (protocols,
models) that can be immediately leveraged by project teams
a knowledge repository for Computational Scientists to capture and share
their best practices
when protocols are
published they are
automatically
wrapped as new
PLP component
ligand idea generators
uncharted chemical space
challenge #3
we are not short of idea generators!
easy to construct vast virtual libraries
we need ways of rapidly scoring and
searching petabytes of data
HERG binding model
training set 98,155 compounds (80%)
talidation set 19,577 compounds (20%)
test set 9,241 compounds
training: Kappa: 0.61, Concordance 80%
training: Sensitivity 81%, Selectivity 80%
test: Kappa: 0.46, Concordance 74%
test: Sensitivity 75%, Selectivity 74%
2000
“Grey zone”,
uncertain
prediction
1500
Inactives
>60%
1000
Actives
>70%
>85%
500
>85%
>95%
>95%
0
-80 -70 -60 -50 -40 -30 -20 -10
No Dofetilide
0
10 20 30 40
Dofetilide
prediction is checked
against activity of at least
3 nearest neighbours to
generate additional
confidence measure
HLM stability model
statistical fingerprint-based model (FCFP-6, Scitegic)
unstable well predicted, stable not
Unstable
HLM stability experiment:
Stable
Moderately stable
Unstable
Experiment
Stable
Stable
Unstable
Prediction
V1a: design for stability
short t1/2
In-vitro Clearance
100
Series 1
Series 2
80
synthetic effort weighted
by desirability
Stable but likely to be
poorly absorbed orally
40
20
long t1/2
0
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
CLOGP
most compounds stable
within this cLogP range
LiPE and LE are quality indicators
linker replacement
aryl switch
needed
O
N N
N N
N
N
N
O
LE (Ligand Efficiency) =
V1a Ki 780 nM
MW 334
cLogP 1.0
t1/2 HLM 120 mins
LE: 0.31
LiPE: 1.9
-1.4 log (IC50)
n Heavy Atoms
N
N
side chain
deletion/replacement
V1a Ki 28 nM
MW 441
cLogP 5.5
t1/2 HLM 6 mins
(Human Liver Microsomes)
N
LE: 0.32
LiPE: 4.8
“how efficient each heavy atom is”
Class dependent…0.3 – 0.5
LiPE (Lipophilic Efficiency) = -log (IC50) - cLogP
“how efficient each lipophilic
fragment is”
using LiPE to view SAR
LiPE=-log (IC50) - cLogP
pIC50
8
10nM
7
100nM
LiPE=6
LiPE=5
6
LiPE=3
1M
LiPE=4
LiPE=2
5
1
2
3
4
5
cLogP
predicting activity
DG(expt)
-10
-8
-6
-4
-2
0
0
-20
-30
kcal/mol
D<U+W>
-10
plot of experimental affinity
versus calculated enthalpy
for reference:
2 kcals = 26-fold off
4.2 kcals = 1000-fold off
-40
"Improving Accuracy in Protein-Ligand Affinity Calculations"
Paper #104, ACS meeting in Philadelphia (Aug 2004)
Michael K. Gilson, Center for Advanced Research in Biotechnology,
Rockville, MD
the source of the problem?
DG D U W T DS
o
-5 to -10 kcals
o
config
15 to -25 kcal
15 to -25 kcal
we usually focus on the interactions D<U+W>
potential energy
force field (CHARMM, AMBER, etc.)
solvation
surface area term:
van der Waals
Coulombic
Hydrogen-bonding
Hydrophobicity/organophilicity
generalized Born/Poisson-Boltzmann
we always neglect TDSconfig
flexibility, entropy terms
Desolvation of polar groups,
Coulomb screening
sampling/Sum over energy wells
Preorganization/Strain
Entropy losses on binding
(rotational, translational, conformational)
we count on cancellation of errors within series, or other corrections, which
leads to scattered data.
challenge #4
we are not short of idea generators!
easy to construct vast virtual libraries
we need more accurate activity prediction
to allow filtering and selection
knowledge management
data access tools
learning culture
web2 technologies
build project teams around Sharepoint/OneNote
implement a RSS strategy around Newsgator
create a literature knowledge sharing culture
use Wiki type technology to share knowledge
Pfizerpedia
thanks to
BSA
David Price
Simon Bailey
Julian Blagg
Nigel Greene
CNS MPO
Patrick Verhoest
Travis Wager
Anabella Villalobos
Spiros Liras
web 2
Jerry Lanfear
activity prediction
Marcel de Groot
Martin Edwards
Alex Alex
Jeff Howe
Ben Burke
VLS
Giai Paolini
Willem Van Hoorn
Enoch Huang
Jeff Howe