Compound Set Enrichment

Download Report

Transcript Compound Set Enrichment

Compound Set Enrichment
A novel approach to analysis
of primary HTS data
Thibault Varin
Ansgar Schuffenhauer
Gubler, H., Parker, C., Zhang, JH., Raman, P., Ertl, P.
Compound Set Enrichment
INTRODUCTION
2 | Compound Set Enrichment | Thibault Varin | 10/07/14
Introduction
 Active series identification: Can relevant SAR be extracted from
primary HTS data?
 Are activity data binary or continuous?
3 | Compound Set Enrichment | Thibault Varin | 10/07/14
Introduction
Active series identification
Hypothesis 1:
Within primary HTS screening data,
structure activity relationships (SAR)
are apparent and can be used to help
selecting active compound classes.
4 | Compound Set Enrichment | Thibault Varin | 10/07/14
Introduction
Are the activity data binary or continuous?
Activity
Scaffold 1
N
Scaffold 2
N
N
O
N
Binary activity:
-1 active / 5 inactives
-Scaffold 1 = Scaffold 2
O
Continuous activity:
Scaffold 1 > Scaffold 2
Active compound (binary)
Inactive compound (binary)
5 | Compound Set Enrichment | Thibault Varin | 10/07/14
Introduction
Are the activity data binary or continuous?
N
N
N
Threshold 1 Threshold 2
Activity
Activity
Binary scaffold activity
is different according to the threshold
Hypothesis 2:
Methods based on an activity cut-off
distort the activity information leading
to the incorrect assignment of active
series of compounds.
Active compound (binary)
Inactive compound (binary)
6 | Compound Set Enrichment | Thibault Varin | 10/07/14
Compound Set Enrichment
METHODS
7 | Compound Set Enrichment | Thibault Varin | 10/07/14
Methods
The Scaffold Tree classification
The Scaffold Tree – Visualization of the Scaffold Universe by Hierarchical Scaffold Classification
A. Schuffenhauer, P. Ertl et al. J. Chem. Inf. Model., 47, 47, 2007
8 | Compound Set Enrichment | Thibault Varin | 10/07/14
Methods
Datasets
-7 PubChem bioassays
- Ranging from 9389 to 263679 compounds
- Ranging from 0.03 to 26.29% of active compounds
Hypothesis 1
Simulation of the
primary screening data
9 | Compound Set Enrichment | Thibault Varin | 10/07/14
PubChem Annotation
from CRC
Methods
Single hypothesis test: summary procedure
 1. State the null and the alternative hypotheses
- H0: „the scaffold is inactive“
- H1: „the scaffold is active“
 2. Specify a significance level: α=0.01
 3. Compute the statistics and the p-value )
→p-value=probability that the scaffold is inactive (H0)
 4. Decision step:
- p-value> α: H0 is accepted
- p-value< α: H0 is rejected and then H1 is accepted
„The scaffold is active“
10 | Compound Set Enrichment | Thibault Varin | 10/07/14
Methods
The KS and the Binomial hypothesis tests
H00:: there
there is
is no
no
H
difference in
in the
the
difference
proportion of
activity
active compounds
distribution
for compounds
defined
by
having the
compounds
scaffoldthe
S3-2 and
having
the proportion
of
scaffold
S3-2 and
active
compounds
the
background
for the full
distribution
dataset.
Continuous data
KS test
11 | Compound Set Enrichment | Thibault Varin | 10/07/14
Bioassay
Scaffold
Actives
Inactives
Binary data
Binomial test
Methods
Multiple hypothesis tests: Bonferroni correction
 Problem of false positives
• α =probability to identify as active an inactive scaffold (for each test done...)
• 100 inactive scaffolds: probability to identify an „active“ by chance is equal
63% (1-0.99100))
 Suggests to test each scaffold at a critical significance level
equal to α = 0.01 / Nbr of scaffolds
 Makes the assumption that the individual tests are independent
 Each level in the Scaffold Tree have been done separately
12 | Compound Set Enrichment | Thibault Varin | 10/07/14
Methods
Determining the activity of classes
Hypo
1
Hypo
2
Scaffold activity evaluation
Multiple hypothesis test correction (Bonferroni)
Comparison of results
13 | Compound Set Enrichment | Thibault Varin | 10/07/14
Compound Set Enrichment
RESULTS
14 | Compound Set Enrichment | Thibault Varin | 10/07/14
Results
Comparison of KSP and BTP predictions
BPCA significantly
actives
Total
Bioassay
KSP BTP
Δ
Hydroxysteroid
330 231 +99
dehydrogenase
331 114 +217
Caspase-1
BPCA KSP
BPCA non significantly
actives
BTP
Δ
KSP
BTP
Δ
+84
199
183
168
+15
147
63
5
2
2
0
329
112 +217
PK
12
4
+8
12
3
3
0
9
1
+8
Luciferase
67
12
+55
15
13
11
+2
54
1
+53
Luciferase
178
48
+130
41
32
35
-3
146
13
+133
CYP450 2C9
58
33
+25
34
34
31
+3
24
2
+22
CYP450 3A4
121
64
+57
60
60
53
+7
61
11
+50
With:
-KSP: KS Prediction
-BTP: Binomial Threshold Prediction
-Δ: KSP-BTP
-BPCA: Binomial PubChem Annotation
15 | Compound Set Enrichment | Thibault Varin | 10/07/14
Both
KSP
BTP
retrieve
Most
of new
KSP
active
classes
Number
ofand
active
classes:
KSP > BTP
BPCA
active classes
are
notsignificantly
BPCA significantly
actives
Results
KSP significantly active scaffolds that are in Pubchem inactives
WA
Inconclusive?
Inconclusives?
H
N
WA
NH
S
S
S
O
WA
WA
Inconclusives?
N
O
O
O
O
N
NH
O
HN
16 | Compound Set Enrichment | Thibault Varin | 10/07/14
Compound activity
(PubChem Annotation)
Active
Inconclusive
Inactive
Results
Prioritize nodes instead of individual scaffolds
Scaffold activity
(KS Prediction / Bonferroni)
Non significantly active
Significantly active
17 | Compound Set Enrichment | Thibault Varin | 10/07/14
Results
Visualization tool (Peter Ertl)
18 | Compound Set Enrichment | Thibault Varin | 10/07/14
Compound Set Enrichment
CONCLUSION
19 | Compound Set Enrichment | Thibault Varin | 10/07/14
Conclusion
Compound Set Enrichment
 Validation of initial hypotheses
 A method to mine HTS data and identify active series of
compounds
• Chemical classification: Scaffold Tree
• Statistical analysis: Kolmogorov-Smirnov hypothesis test
• Multiple hypothesis test correction: Bonferroni correction
 Use all primary data
 No activity cut-off
 Identification of new active scaffolds not necessarily
represented by very active compounds (latent hits) during
the primary screen
20 | Compound Set Enrichment | Thibault Varin | 10/07/14
With many thanks to
Acknowledgments
Primary mentor:
- Ansgar Schuffenhauer
Help: MLI group
Scientific advisers:
-Christian Parker
-Hanspeter Gubler
-Ji-Hu Zhang
-Peter Ertl
-Edgar Jacoby
Fellowship: Education office
Discussions:
-Martin Beibel
-Sebastian Bergling
-Meir Glick
-Alain Dietrich
-Marie-Cecile Didiot
21 | Compound Set Enrichment | Thibault Varin | 10/07/14
Questions?
22 | Compound Set Enrichment | Thibault Varin | 10/07/14