Transcript Powerpoint

ABCD
2nd Joint Sheffield Conference on Chemoinformatics:
Computational Tools for Lead Discovery
Flexsim-R: A new 3D descriptor for combinatorial
library design and in-silico screening
Outline
• Introduction
• The Flexsim-R Methodology
• Validation
• Conclusion and Outlook
ABCD
Introduction
What is Flexsim-R?
Flexsim-R calculates 3D descriptors for reagents,
based on the virtual affinity fingerprint idea
ABCD
Motivation to develop Flexsim-R
ABCD
• Reagent-based descriptors are important for
– combinatorial library design
– virtual screening experiments
– bioisosteric replacements
– rational augmentation of inhouse reagent pool
• For large combinatorial libraries, product-based descriptor
calculation is often not feasible
-> possible solution: reagent-based product selection (e.g. by a GA)
• Descriptor calculation should be fast and automizable
• Descriptor should be related to experimental affinity data
• Encouragement by virtual affinity fingerprint methods
In-vitro Affinity Fingerprints
ABCD
Terrapin's Affinity Fingerprint Approach:
(Kauvar et al., Chemistry & Biology, 1995, 2, 107-118)
A1 A2 A3 A4 A5 A6 A7 A8
Molecular similarity
is defined by
in-vitro binding patterns
("Affinity Fingerprints")
of a ligand set (L) in
reference binding
assays (A)
L1
L2
L3
L4
L5
L6
Virtual Affinity Fingerprints (VAF)
ABCD
Terrapins in-vitro screening in diverse reference assays is simulated
• by Computational Docking into a reference panel of protein
pockets (Docksim, Flexsim-X)
• by Computational Fitting onto a reference panel of small molecules
(Flexsim-S)
(Briem and Lessel, Perspectives in Drug Discovery and Design, 20 (2000) 231-244)
ABCD
The Flexsim-R Method
Rgroups
Products
Core
O
O
R
X N
O
N
NH
O
O
H2N
R
NH
O
O
H2N
N
O
NH
The Flexsim-R Method
ABCD
Problems with Rgroups in conventional VAF approaches:
• Rgroups tend to be smaller than „drug-like“ molecules
• Alignment rule by common core attachment point gets lost
Protein pocket
Solution: Core-constrained multiple-site docking
ABCD
The Flexsim-R Method
Components of core-constrained multiple-site docking:
1. Rgroup Set
2. Common Core
3. Protein Binding Pockets
The Flexsim-R Method
First step:
• Docking of common core group with FlexX
• Multiple (e.g. 50 best) solutions are stored
• RMS threshold can be applied to prevent clustering
ABCD
The Flexsim-R Method
Example: Thrombin active site with 50 best FlexX solutions of hydantoin
(RMS threshold = 2.0)
ABCD
ABCD
The Flexsim-R Method
Second step:
• Docking of core group + rgroup with FlexX
• Pre-stored core positions serve as reference
• FlexX scores are stored in descriptor matrix
Descriptor Matrix
Protein pocket
Core Pos1
Core Pos2
...
R1
15.5
15.7
...
R2
11.2
22.0
...
R3
21.7
13.5
...
...
...
...
...
ABCD
The Flexsim-R Method
Affinity Profiles for Ala and Gly
Docking Score
5
0
A
-5
G
-10
-15
0
10
20
30
Core Position
40
50
ABCD
The Flexsim-R Method
Affinity Profiles for Asp and Glu
Docking Score
5
0
D
-5
E
-10
-15
0
10
20
30
Core Position
40
50
ABCD
The Flexsim-R Method
Affinity Profiles for A/G and D/E
Docking Score
5
0
A
G
-5
D
-10
E
-15
0
10
20
30
Core Position
40
50
ABCD
The Flexsim-R Method
Multiple protein pockets
->
Concatenated descriptor matrix
Pocket 1
C1
R1
R2
R3
...
C2
C3
Pocket 2
C1
C2
C3
Pocket 3
C1
C2
C3
ABCD
The Flexsim-R Method
Multiple core attachment points
O
X1
N
->
O
NH
HN
N
X2
X1
C1
O
O
R1
O
HN
R2
O
NH
HN
X3
O
Concatenated descriptor matrix
NH
R3
X4
O
...
C2
X2
C3
C1
C2
X3
C3
C1
C2
X4
C3
C1
C2
C3
ABCD
The Flexsim-R Method
Example: Hydantoin Core
O
X1
N
O
N
X2
X3
X4
4 attachment points * 7 protein pockets * 50 FlexX solutions
-> descriptor vector length = 1,400
ABCD
The Flexsim-R Method
Test set for method development and evaluation:
• Rgroups: 20 natural amino acids
• Core groups:
Hydantoin
N
O
Benzimidazole
OH
O
X1
Pyrimidopyrimidine
Phenole
N
X1
X2
X3
X4
X2
X3
N
N
X2
N
X1
N
N
N
X1
X2
X3
• 7 protein pockets:
1dwc, 1eed, 1pop, 2tsc, 3cla, 3dfr, 5ht2 (model)
Correlation Analysis
ABCD
• Analyses were performed to check correlation between
• different protein pockets
• different cores
• different attachment points
• Analyses are based on euclidian distance matrices for all 190 pairwise
amino acid vector combinations
ABCD
Correlation Analysis
• Correlation matrix of protein pockets:
(hydantoin core, all 4 attachment points)
Protein
1dwc
1eed
1pop
2tsc
3cla
3dfr
1eed
0.922
1pop
0.917
0.937
2tsc
0.852
0.784
0.853
3cla
0.794
0.726
0.740
0.723
3dfr
0.889
0.863
0.940
0.924
0.795
5ht2
0.826
0.811
0.894
0.864
0.838
0.932
ABCD
Correlation Analysis
• Correlation matrix of core groups:
(all 7 protein pockets, all attachment points)
Core
Hydantoin
Phenole
Pyrimido-pyrimidine
Phenole
0.954
Pyrimidopyrimidine
Benzimidazole
0.971
0.978
0.963
0.973
0.987
ABCD
Correlation Analysis
• Correlation matrix of attachment points:
(hydantoin core, all 7 protein pockets)
Position
X1
X2
X3
X4
X2
0.985
X3
0.981
0.964
X4
0.988
0.994
0.967
All
0.995
0.995
0.983
0.995
Correlation Analysis
ABCD
Reduction of descriptor vector length (dimensionality) :
• no PCA was performed, since we want to get information about the
most uncorrelated descriptor columns
• instead, an elimination method has been applied:
 the complete pairwise correlation matrix is calculate
 all pairs of columns with correlation coefficient (r) above a userdefined threshold (e.g. 0.7) are considered for elimination
 from each correlating pair, that column is eliminated which can be
better described by multiple linear regression of the remaining
descriptors
 resulting matrix doesn‘t contain pairs of columns with correlation
coefficient above the threshold
ABCD
Correlation Analysis
Example: hydantoin core, all 7 proteins, all 4 attachment points
1200
Descriptor Vector Length
1100
Descriptor set 1
1000
800
600
443
400
200
130
7
0
0
0.1
0.2 0.3
15
20
0.4 0.5 0.6
54
0.7 0.8
0.9
1
r (Threshold)
Descriptor set 3
Descriptor set 2
Correlation Analysis
Thrombin with three most information-rich core positions
ABCD
Descriptor Validation
ABCD
• Five peptide datasets, taken from literature
(Refs. in Matter, H., J. Peptide Res. 52 (1998) 305-314)
• Product descriptors are generated by concatenation of
respective reagent descriptors
• Validation by PLS Analysis
• leave-one-out (LOO) and leave-random-groups-out (LRGO)
cross-validation
ABCD
Descriptor Validation
• Datasets:
Activity
N
Peptide length
ACE
ACE-Inhibitors
58
2
BIT
Bitter-tasting
48
2
BRA
Bradykininpotentiating
29
5
ENK
Enkephalinanalogs
19
5
BR9
Bradykininanalogs
26
9
ABCD
Descriptor Validation: Results
Leave-random-groups-out (LRGO) results:
q2
ACE
BIT
1.0 0.7 0.5
1.0 0.7 0.5
BRA
ENK
BR9
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1.0 0.7 0.5
r Thresholds
1.0 0.7 0.5
1.0 0.7 0.5
Summary
ABCD
• Flexsim-R comprises a novel virtual affinity fingerprint method,
which calculates meaningful 3D descriptors for reagents
• High correlation between different cores and attachment points
• For 3 out of 5 validation sets, significant cross-validated
q2 values could be obtained
• Rgroup alignment problem is tackled inherently
• Flexsim-R calculations are fast and can be automated easily:
• only clipped reagent structures are required
• core positions need to be calculated only once
Outlook
ABCD
• More validation sets have to be tested (e.g. „real-life“
combichem dataset)
• Is there a set of descriptors, which works well for different
datasets?
• Integration in Boehringer Ingelheim library design and virtual
screening workflow
Acknowledgements
ABCD
• Alexander Weber (Boehringer Ingelheim/University of Marburg)
• Andreas Teckentrup (Boehringer Ingelheim)
• Hans Matter (Aventis)
• BMBF for financial support