Chemical Space Navigation using Spotfire DecisionSite

Download Report

Transcript Chemical Space Navigation using Spotfire DecisionSite

Chemical Space Navigation using SpotFire DecisionSite
Anne Marie Munk Jørgensen,
Morten Langgård
& Jacob Flemming Hansen
H. Lundbeck A/S
PogoProp System Overview
SDfile
Calculate
descriptors
Structures
and
parameters
Do
Statistics
H. Lundbeck A/S
Compound library profiling
• 10 years ago: Diversity
• 1997: Lipinski’s rule of 5 + diversity
• Now: very high focus on how biologically
relevant the screening collection is.
• Computational methods to predict drug
likeness, CNS likeness….
High throughput is not enough … to get
high output…..
H. Lundbeck A/S
Data collection
ChemIndex
CNS
MDDR,
Beilstein,
Chemfinder,
PharmSub
H. Lundbeck A/S
Corina
QikProp
CNS
database
373 compounds with
known CNS activity.
Structures and
calculated parameters
parameters
CNS model
Scatter Plot
12 descriptors 
3 components,
R2X=0.71
Blue dots define::
CNS drug space
CNS World
H. Lundbeck A/S
SpotFire as Chemical space navigator
• Chem 3D principal component plot defines
the CNS ”world” – and we use SpotFire to
navigate in this world.
• Chem GPS (Oprea & Gottfries,
J. Comb. Chem 2001)
H. Lundbeck A/S
… Some technical issues…..
H. Lundbeck A/S
PCA calculation
• … was done with SIMCA (Umetrics)
PCA tool in SpotFire:
- important parameters like dist2Model and
HotellingT2 are missing in SpotFire
- It is not possible to save the model and use
it for later predictions (perhaps in a template )
- Loading plot is missing
• Formulas extracted from Simca and used
automatically on new data ……
H. Lundbeck A/S
Statistics
• …. Were done by use of a perl script.
”New column by expression” works
fine but only for relatively short
function definition…..
(((((((((((( 0.279851 *( ”COL3" - 0.874500 )/0.597500) - ( 0.151222 *( ”COL5" 0.081360 )/0.273800)) + ( 0.289006 *( ”COL8" - 3.942000 )/2.580000)) + ( 0.424097 *(
”COL2" - 396.400000 )/88.470000)) + ( 0.050571 *( ”COL3" - 559.400000
)/118.300000)) + ( 0.290386 *( ”COL4" - 236.100000 )/117.800000)) - ( 0.156653 *(
”COL8" - 90.130000 )/57.770000)) + ( 0.782491 *( ”COL11" - 193.500000
)/103.400000)) + ( 0.109637 *( ”COL12" - 31.640000 )/45.550000)) + ( 0.454716 *(
”COL15" - 944.500000 )/289.300000)) - ( 0.176048 *( ”COL16" - 1.066000
)/1.071000)) + ( 0.214576 *( ”COL19" - 4.940000 )/2.264000))
H. Lundbeck A/S
User Interface
• Intranet based
• You select User name,
Job name, browse to
the sdfile and write
the name of the field
containing the
structure identifier
 Go
• Get an email when the
job has finished.
H. Lundbeck A/S
SpotFire call back function
With this function, it is possible to make
a selection of the compounds one wants to
synthesize or buy, and communicate this back
to the database for further use  easy to make
a list of compounds one can send to the external
provider or the technicians.
This function was created by our IT team. They
created it in almost the same way as the text book
example
H. Lundbeck A/S
Use of the setup…. and the CNS model…
H. Lundbeck A/S
Input Structures
… can come from several sources
• Compounds we are considering for purchase
• Compounds which we plan to synthesize
H. Lundbeck A/S
Structure Property Calculations
Structures from ISIS
N
Corina
2D->3D
O
Qikprop
PSA desc. +
F
N
Tabulated output
CN(C)CCCC1(OCc2cc(C#N)ccc12)c3[c]cc(F)cc3
Descriptors:
PSA’s: SASA,FOSA,WPSA,……
Counts: MW,#HBD,#HBA,#ROT,#…..
Models for:
logP, logS, CNS, Caco2, ……
H. Lundbeck A/S
Merge with CNS DB
of launched drugs
(Chem GPS)
Principal Component Analysis (PCA)
Scatter Plot
3
2
1
0
-1
-2
-3
-4
-5
-4
-2
0
2
4
6
PC1
How could this loading plot be included in Spotfire?
H. Lundbeck A/S
Distance to model
Xtreme
2.5
N
O
N
2
N
O
N
N
+
N
O
O
N
1.5
O
1
S
O
N
N
N
N
0.5
H. Lundbeck A/S
O
O
S
O
0
2
4
6
8
10
HOTELLINGT2
12
14
16
18
N
Non CNS compounds from WDI
Scatter Plot
0
-5
-10
-15
-20
-25
-30
-35
-5
0
5
10
PC1
H. Lundbeck A/S
15
20
25
Distance to model, example
N
O
N
N
N
N
S
N
F
F
F
O
Why is HotT2 large? User defined ”details on demand” facility…..
H. Lundbeck A/S
SpotFire Guides
• Lipinski filtering on defined columns
• Make histograms of selected descriptors
Histogram
120
100
80
60
40
20
50-100 100-... 150-... 200-... 250-... 300-... 350-... 400-... 450-... 500-... 550-... 600-...
H. Lundbeck A/S
Binned MW(2)
Clustering of structures
Bar Chart
Diversity selection 
structural clustering
works fine in SpotFire
3...
Get one structure
from each cluster
-preferably the
one in the center
of each cluster
2...
1...
1
2
3
4
5
6
7
8
kiaFragment5_cl10 (cluster)
H. Lundbeck A/S
9
10
11
12
not possible in SpotFire
Suggestion to development in SpotFire
• More advanced statistic tools wanted
- it should be possible to save a model as a
template and use it for prediction
- Loading plot should be included
- 99/95% ellipse should be shown
• Easier way of handling expressions
• In clustering analysis, it should be possible to
get the center structure from each cluster
• User defined “details on demand” function
H. Lundbeck A/S