Lecture 6: Drug Design Methods I: QSAR - BIDD
Download
Report
Transcript Lecture 6: Drug Design Methods I: QSAR - BIDD
CZ3253: Computer Aided Drug design
Lecture 6: QSAR part II
Prof. Chen Yu Zong
Tel: 6874-6877
Email: [email protected]
http://xin.cz3.nus.edu.sg
Room 07-24, level 7, SOC1,
National University of Singapore
Examples of QSAR Applications:
Application of in silico technology to
screen out potentially toxic compounds
using expert and QSAR models
2
Commercial Software
Commercially available toxicity estimation packages are
available to predict a variety of toxic endpoints including
mutagenicity, carcinogenicity, teratogenicity, skin and eye
irritation and acute toxicity:
•
DEREK (Deductive Estimation of Risk from Existing
Knowledge)- www.chem.leeds.ac.uk/luk
•
HazardExpert – www.compudrug.com/hazard
•
CASE (Computer Automated Structure Evaluation) –
www.multicase.com
•
TOPKAT (Toxicity Prediction by Computer Assisted
Technology) – www.accelrys.com/products/topkat
•
OncoLogic – www.logichem.com
3
Pharma Algorithms
Providers of Databases, Predictors and Development Tools
N
LogP 10,000
DMSO Solubility22,000
pKa
8,000
Stability at pH < 220,000
Aqueous Solubility 5,500
Permeability (HIA) 1,000
Active Transport 500
Pgp Transport 1,000
Oral Bioavailability (Human) 900
LD50 Intraperitoneal36,000
...
...
4
Pharma Algorithms Development Tools
Algorithm Builder development platform:
• Data storage and manipulation
• Generation of fragmental
descriptors
• Statistical procedures: MLR,
PLS, PCA, Recursive
Partitioning, HCA
• Tools for predictive algorithm
development
5
Generation of Descriptors
Structure1
Structure2
...
...
Y
...
StructureN
...
...
F1 F2 F3
...
FM
... ... ...
...
...
... ... ...
...
...
... ... ...
...
...
... ... ...
...
...
6
“Causal” Descriptors
Atom chains
Examples
Activity effects
HO
One-atom
("topological")
HO
O
Non-specific
(size, PSA)
O
COOH, CONH
N
HO
Specificity
Fragment Size
Three-atom
HO
Ionization,
H-bonding
N
Cl
HO
Five-atom
HO
O
Reactivity, internal
interactions
N
Larger chains,
Ring scaffolds
O
O
HO
HO
N
O
O
N
N
H
O
Similarity to natural
compounds
7
Algorithm Development
•
Graphical Interface provides easy to use tools for
programming complex algorithms
•
Combine fragmental, descriptor and similarity based
methods
•
Use logical expressions, conditions and equations
based on descriptors, sub-fragments, internal
interactions or any other chemical criteria
•
Combine multiple sub-algorithms into general
algorithms
•
Rapidly develop ‘custom’ filters incorporating
‘expert’ in-house or project specific rules
8
Tox Effects in Drug Design
Tox Effect
Acute (LD50)
Organ-specific effects
Mutagenicity
Programs
Topkat, AB/LD50
AB/Tox* (next version)
Many programs, AB/Tox
Reproductive effects
Many programs, AB/Tox*
Carcinogenicity
Many programs, AB/Tox*
Our focus
9
Existing Programs
ADME
LD50
QSAR
QickProp
TopKat
Expert
COMPACT
DEREK
HAZARD
“Manually” derived
skeletons
C-SAR
META
M-CASE
“Statistical”
skeletons
AB/Tox
Combinations of above
Combined AB/Oral %F
AB/LD50
Other
Descriptors
Mixed
Will consider these
10
What Is LD50
A dose that kills 50% of animals during 24 hrs
In drug design, used at pre-clinical stage
In early stages, replaced with “reductionist” considerations
Some scientists question its utility
11
Complexity of LD50
Oral LD50
Oral %F
Tox Effects
Distribution
"Basal"
CNS, PNS
Excretion
Organ-specific
Alkylation
ATP Synthesis
"Narcosis"
Krebs Cycle
Empirical
knowledge +
simulations
Other targets
“Reactivity
+ log P ”
Informatics
Empirical
knowledge
Toxicologists
PK Specialists
12
Acute Tox in Drug
Design
Lead Selection
No tests performed
Reactive groups discarded
Lead Optimization
Is this good enough?
Basal cytotoxicity tested
Intra-cellular effects considered
Pre-clinical Stage
Animal tests are required
ADME effects considered
13
Acute Tox in Drug Design
An LD50 Model for mouse (intraperitoneal administration)
was developed using data from the RTECS database
(35,000 compounds)
14
Distribution of Acute Effects
RTECS DB: mouse, intraperitoneal administration
All compounds
(N ~ 35,000)
LD50 < 50 mg/kg
(N = 4,099)
Natural toxins
Narcosis
55%
25%
6% Reactivity
Reactivity
38%
7% Other
32%
14%
Other
23%
Nervous systems
(hydrophobic bases)
Extra-cellular effects - may be “invisible” in cytotoxic assays
15
In Vivo vs. In Vitro
Extra-cellular
- Log LD50
Natural toxins
N
Log LD50
N H
O
N
N
O N
O
N
LD50 = 51 mg/kg
N O
N
N
LD50 = 0.008 mg/kg
Intra-cellular
N
ADME Factors
N
O
Log IC50
Intestinal permability,
1st pass metabolism
LD50 = 750 mg/kg
IC50 cannot model LD50 when extra-cellular effects occur
16
How to Predict These Effects?
LD50 involves much more than “log P + reactivity”
“Reductionist” QSARs do not work
Quality of Predictions = Knowledge of Specific Effects
How much knowledge do we get?
17
How Much Knowledge?
QSAR Model
Knowledge
Log 1/LD50 = ai xi
Expert Deduction
Cl
Little Knowledge
N
N
Cl
Active
C-SAR + Deduction
Cl
More Knowledge
Cl
Inactive
CN
CN
N
N
Active
Inactive
Cl
Active
Cl
Inactive
Struct. Space
18
C-SAR + Deduction
O
N
n = 7588
avg. = 1.048
sd = 0.641
N
O
P
No
O OO
Yes
F81 >= 1
No
Yes
n = 7165
avg. = 0.999
sd = 0.583
N0
n = 423
avg. = 1.878
sd = 0.943
N1
Hal
N
No
Yes
F44 >= 1
No
Yes
Yes
F36 >= 1
No
n = 5918
avg. = 0.936
sd = 0.526
N000
n = 321
avg. = 1.46
sd = 0.805
N01
No
Yes
n = 169
avg. = 1.221
sd = 0.737
N010
Yes
n = 239
avg. = 2.085
sd = 0.953
N11
Yes
F7 >= 1
No
n = 926
avg. = 1.245
sd = 0.694
N001
n = 184
avg. = 1.61
sd = 0.86
N10
Yes
F56 >= 1
No
n = 6844
avg. = 0.978
sd = 0.562
N00
No
No
Yes
n = 152
avg. = 1.725
sd = 0.797
N011
LD50 values are split into groups using fragmental descriptors from AB
The most significant skeletons are “potential toxicophores”
19
Specific Effects in AB/LD50
> 33,000 Compounds with LD50 from RTECS DB
Cholinesterase
DNA Alkylation
O
Active:
O P O
N
O
O
Inactive:
P O
N
O
O
S
O
Cl
HON
Cl
O
ATP Synthesis
CN
Active:
O
Cl
O
N
O
O S
O
F
O
N
O
HO S
O
Natural toxins
O
F
H
H
O
O
Cl
Inactive:
Cl
F
O
O
O O
N
O
P
O O
N
N
O
CN
O
N
O
O
N
O
O
O
O
Toxicity classes
20
Low-Specific Effects
Arrows denote increasing toxicity
Narcosis
Nervous syst.
MW
Log P
300
3.2
N
O
N
N
230
1.2
N
LD50 = 750 mg/kg
("narcosis")
3.5
7.0
8.5
Base pKa
LD50 = 51 mg/kg
(CNS effect)
Small non-bases are least toxic.
Hydrophobic amines are most toxic
21
Efficacy Comparison
Knowledge
C-SAR + Deduction
Expert Deduction
QSAR Model
Struct. Diversity
To get new knowledge, statistics must help deduction.
To use QSAR models, they must work in narrow structural spaces.
22
QSAR Models in AB/LD50
O
- Log LD50 = ai Fi
CN
HN
O
Class
O
O
CN
HN
HN
O
O
O
O
CN
HN
O
O
CN
HN
O
O
HN
CN
O
CN
HN
CN
O
Five-atom chains
Reactive skeleton
1. Narrow struct. spaces
2. Dynamic fragmentation
3. “Causal” parameters
pLD50
N
R
S-1 Specific toxins
+1.0 ... +6.5
260
---*
S-2 Organometallics
-0.5 ... +2.5
120
---
S-2 Covalent cations
-0.5 ... +4.5
1,300
0.86
S-4 Cholinesterase
-1.5 ... +4.0
1,100
0.89
S-5 Alkylating agents
-0.5 ... +2.5
800
0.82
S-6 ATP Synthesis
+0.0 ... +2.4
600
0.79
L-1 Lipophilic bases
-0.5 ... +1.5
4,000
0.75
L-2 Non-lipophilic bases -1.0 ... +1.0
3,800
0.75
L-3 Weak bases
-1.0 ... +0.9
4,600
0.75
L-4 Hydrophilic bases
-1.2 ... +0.8
3,000
0.82
N-1 Large non-bases
-1.5 ... +1.0
3,300
0.84
N-2 Very weak bases
-1.5 ... +0.8
2,800
0.83
N-3 Mid-size non-bases -1.5 ... +0.5
4,300
0.80
N-4 Small non-bases
-2.0 ... +0.5
3,700
0.76
All compounds
-2.0 ... +6.5
33,680
0.83
* Similarity algorithm based on MACCS II key
S - Specific effects, L - Low-specific, N - Non-specific.
23
What is novel?
The novel features of the Pharma Algorithms approach are:
•
Combination of approaches used separately in earlier
software i.e. Expert Rues (e.g. DEREK), C-SAR (e.g.
CASE) and QSAR (e.g. TOPKAT)
•
Reliable Confidence Intervals are generated from
QSAR models (class specific and global) that are derived
using an automated multi-step process:
1.
2.
3.
4.
5.
Chain fragmentation and PLS with multiple bootstrapping
Selection of best fragments with ‘stable’ increments
Derivation of multiple models from subsets of the training set
to produce ranges of predictions
Selection of the best model to use for a particular compound
by comprison of the different ranges
Calculation of the confidence interval from the range of
predictions produced by the most appropriate model
24
Screening the Specs DB
SPECS are a supplier of diverse compound screening collections
A set (N = 14,902) was randomly selected (from > 200,000) and
screened using the AB/LD50 toxicity predictor.
Calculation of LD50 for the set takes about 30min on a standard
Windows laptop
Compounds were deemed “Toxic” if LD50 < 50 mg/kg
Results:
Overall only 2.7% were “toxic” (i.e. 310 of 14,902)
As expected a higher proportion (3.9%) of the bases (i.e
alkylamines) were toxic (i.e. 92 of 2,351)
25
Toxic Skeletons
O
O
O
CN
O
O
CN
100% (4/4)
Natural toxin?
37% (6/16)
Alkylation, oxidation
31% (16/52)
Cyanide relase
N
O
H
15% (4/26)
Cholinesterase
Artefacts?
N N
N
S
N
MW > 340
68% (86/127)
Most significant
N
38% (60/158)
CF 3
N
N
N
O
25% (6/24)
N
N
67% (4/6)
N
HN
N
O
O
30% (6/20)
N
MW < 260
30% (3/9)
N
H
N
N
21% (7/33)
Exp. verification required
26
What We Have Learned So Far
Screening for basal cytotoxicity is not enough
The “C-SAR + Deductive” method opens new possibilities
The extra-cellular effects can be estimated in silico
Can we model in vivo toxicity?
27
Administration vs. ADME
OR
ADME Effects
OR
Sc
IP
Stomach
IV
Vein
Intestine
Liver
IV
OR – Oral
Sc – Subcutaneous
IP – Intraperitoneal
IV – Intravenous
Tissue,
organs
Toxic
action
Dissolution, permeation,
hydrolysis, metabolism
28
Complexity of ADME
Oral %F
Absorption
Liver 1 st Pass
Solubility
Transporters
Permeability
Gut 1st Pass
“Simple descriptors”
“Simulations”
Informatics
ADME Specialists
“Simple descriptors” disregard many factors.
Can we simulate them in HT mode?
29
Oral %F Prediction in HT Mode
Non-Batch Interface:
Reliability validated by the consistency of independent predictions
30
Cost/Benefit Considerations
In silico Bioavailability and Toxicity predictions for compound
collections are inexpensive to perform
The value of predictions is variable- Decisions still need to be
made by expert scientists in a project context
In silico tools can assist the expert in a detailed evaluation of
‘hits’, ‘leads’ and ‘candidates’ but there is a need for:
1. Predictions for a range of toxicity types:
LD50 (oral, i.v.,s.c.)
Genotoxicity and Carcinogenicity
Organ specific Effects (e.g. hepatotoxicity)
2. Integration of the prediction software with databases
containing the training data so that the availability and
behaviour of similar compounds can be checked
31
Drug Design
General Principles
Aim for low logP
Aim for low M.Wt.
C. Hansch et. al. ‘ The Principle of Minimal Hydrophobicity in
Drug Design’ J. Pharm. Sci., 1987, 76, 663
M.C. Wenlock et. Al. ‘Comparison of Physicochemical Property
Profiles of Development and Marketed Oral Drugs’ J. Med.
Chem., 2003, 46, 1250
32
Simulations in HT Screening
HT Simulations aim at:
%F
High Activity = High %F + Low Tox
Activity
“Reductionist” Methods:
High Activity = Low %F + High Tox
Very rough estimations, assuming that activity
increases with increasing log P and MWt
%F
Activity
33