Transcript Document

ChemAxon User Group Meeting, June 2006
A Novel SAR-Driven Approach for Identifying
True High-Throughput Screening Hits
S. Frank Yan, Hayk Asatryan, Jing Li, Kaisheng Chen, and Yingyao Zhou
Genomics Institute of the Novartis Research Foundation, 10675 John Jay Hopkins Drive, San Diego, CA 92121, USA
Modern drug discovery relies heavily on large-scale high-throughput screening (HTS) to identify potential starting points for medicinal chemistry optimization. The typical “top X” activity cutoff method used to generate
hits from large amount of raw HTS data is intrinsically error-prone due to the noisy nature of single-dose HTS, which oftentimes leads to a large number of false positives. Here we propose a novel knowledge-based, SARdriven statistical approach for primary HTS hit generation using ChemAxon technology for clustering and chemical fingerprints. The method is also implemented with SciTegic Pipeline Pilot. In a proof-of-concept study for
an in-house HTS campaign, the new approach proved to be more effective in identifying confirmed active compounds in diverse chemical scaffolds containing valuable SAR information, as demonstrated by a significantly
improved confirmation rate compared to the traditional “top X” cutoff method.
The Hit-to-Lead Paradigm
Two important milestones that have
fundamental far-reaching effects
Ontology-Based Pattern Identification* in Hit Selection
Scaffolds
with okay
activity but
good SAR
Guilt by association  Structure–activity
relationship
To automatically determine a subset of compounds
for each cluster/scaffold, which not only share similar
structure but also similar high HTS activity
•Cluster all tested, QC-ed compounds (>1,000,000) from an HTS
campaign and rank them by activity
Bleicher et al. (2003) Nat. Rev. Drug Discov., 2, 369
Scaffolds
with very
bad SAR
Likely a false
positive
Scaffolds
with good
activity and
good SAR
Scaffolds with
good activity
but okay SAR
HTS assay activity

Highly active
singletons
cutoff
traditional cutoff
cutoff
•For one given cluster, select more and more compounds by
decreasing the activity cutoff and compute the corresponding
hypergeometric P-value
The HTS Approach
•HTS data from an internal project were used and results
from secondary experiments were used as benchmark. The
50,000 most active compounds were selected for analysis
(HTS activity < ~0.76)
•Compound clustering and fingerprinting were generated
using ChemAxon software.
Confirmation rate for those
selected compounds
OPI approach
Top X method
Great Improvement over the
traditional “Top X” method
Compound group
•The cutoff for this cluster is determined when P-value reaches
minimum P0, and member compounds whose activities are
higher than the cutoff are selected as potential hits and assigned
a score P0
A Proof-of-Concept Study
Valuable SAR Is Immediately Caught for This
Scaffold
•Repeat steps 2 and 3 for all clusters
•Rank/select hits based on score P0 and HTS activity
N compounds
from HTS
Increasingly
select m
>1,000,000
Initial HTS
campaign
1,000,000
Quality
control
1,000
100
Hit
validation
Primary hit
selection
Cluster probability score
P0 = min P(N,n,m,m’)
m’ compounds (P=P0)
are selected as
potential hits for this
compound
cluster/scaffold
m’
0.67
Not selected
0.18
A new approach to more effectively select primary
hits is urgently needed!
SIDXXX598
8 compounds
selected, 7/7
SIDXXX414
8 compounds
selected, 5/6
confirmed active
mean = 0.05
stdev. = 0.18
confirmed active
mean = 0.05
stdev. = 0.46
SIDXXXX645
28 compounds
selected, 12/28
SIDXXXX000
57 compounds
selected, 31/36
confirmed active
mean = 0.11
stdev. = 0.30
confirmed active
mean = 0.31
stdev. = 0.09
0.19
Significant Structural Diversity in the
Selected Hits
0.23
Some Scaffolds Picked by OPI
0.26
Selected hits
Implementation Using Pipeline Pilot
0.12
Advantages of OPI Hit-picking
•An individualized activity threshold for every
cluster/scaffold instead of a one-fits-all cutoff
•Effective in eliminating experimental artifacts
(particularly those in the high-activity region)
0.41
0.19
0.50
•Improved hit confirmation rate (85% vs. 55%)
0.23
High activity
0.16
0.18
0.18
0.65
0.26
~100 to ~5000
Low activity
0.12
0.5
A cluster of n
compounds
An arbitrary
activity cutoff
In many real
cases, the
confirmation
rate is often
low
0.41
0.12
0.51
“Cherry-Pick” the HTS Hits
# of compounds
compounds by
lowering the
activity cutoff
Imidazopyridine
Scaffold-based Probability Score Alone
Is Sufficient to Prioritize Hits
Lower activity,
more compounds
*Novel Statistical Approach for Primary High-Throughput Screening Hit Selection
S. Yan et al. J. Chem. Inf. Model. 45(6), 1784-1790, 2005
In silico gene function prediction using ontology-based pattern identification
Y. Zhou et al. bioinformatics, vol.21 no. 7 2005, p1237-1245
•Hits are inherently analyzed on a cluster/scaffold
basis and SAR information can be readily
extracted, facilitating the hit-to-lead process
•Some level of library redundancy is required