Drug prioritization using rank/score fusion methods

Download Report

Transcript Drug prioritization using rank/score fusion methods

Late fusion methods and
performance metrics for the
effective prioritization of drug
candidates
Author: Gábor Csizmadia
Supervisor: Péter Antal
Abstract
There are many different ways to assess similarities
between compounds, such as:
Structure-based
Chemical property-based
Biological effect-based
Literature-based
•
•
•
•
If we combine different methods, we should get more
accurate results -> data fusion
Implemented software: rank and score fusion methods,
performance metrics
Overview
1. Drug prioritization
2. Data fusion approaches
3. Rank/score fusion
4. Performance metrics
5. Implemented software
6. Future plans
Drug prioritization
List of known active
compounds for a specific
condition
Assess
similarities
to other
compounds
Predicting which compounds
are active in the unknown set
Data fusion approaches
1. Early: data vectors are concatenated
2. Intermediate: similarity matrices are combined
3. Late: rankings or scorings are combined -> rank and
score fusion
Rank and score fusion
Learning to rank
Rank fusion methods:
Borda fusion
Rank vote
Pareto ranking
Parallel selection
•
•
•
•
Score fusion methods:
Sum score
•
Borda fusion
1. Each ranking assigns a certain number of
points to the ranked compounds based on
their rank
2. The points are then summed to get the score
of each compound
Ranking 1
Ranking 2
Ranking 3
Borda count
Compound 1
4
3
3
10
Compound 2
3
2
4
9
Compound 3
2
4
2
8
Compound 4
1
0
1
2
Compound 5
0
1
0
1
Rank vote
1. Each ranking votes for its top n compounds
2. The ranking is based on how many votes a
compound received
(n=2)
Ranking 1
Ranking 2
Ranking 3
Votes
Compound 1
1
1
1
3
Compound 2
1
0
1
2
Compound 3
0
1
0
1
Compound 4
0
0
0
0
Compound 5
0
0
0
0
Pareto ranking
Each compound is ranked based on the
number of compounds better in all rankings
Ranking 1
Ranking 2
Ranking 3
better in all
Compound 1
1
2
2
0
Compound 2
2
3
1
0
Compound 3
3
1
3
0
Compound 4
4
5
4
3
Compound 5
5
4
5
3
•
•
Parallel selection
Compounds are selected from each ranking
in turn
If a compound that would be selected has
already been selected before, the next
compound from that ranking is selected
instead
Ranking 1
Ranking 2
Ranking 3
Fused ranking
1
Compound 1
Compound 3
Compound 2
Compound 1
2
Compound 2
Compound 1
Compound 1
Compound 3
3
Compound 3
Compound 2
Compound 3
Compound 2
4
Compound 4
Compound 5
Compound 4
Compound 4
5
Compound 5
Compound 4
Compound 5
Compound 5
Sum score
The normalized scores of each ranking are
summed to get the fused score of a
compound
Ranking 1
Ranking 2
Ranking 3
Sum score
Compound 1
1
0.9
0.7
2.6
Compound 2
0.8
0.5
1
2.3
Compound 3
0.7
1
0.5
2.2
Compound 4
0.2
0
0.1
0.3
Compound 5
0
0.3
0
0.3
Performance metrics 1.
The performance of a ranking (how early it ranks actives) can be
measured in various ways:
Area under curve (AUC) values for the following:
AC (Accumulation Curve): plots the true positive rate as a function
of the fraction of data classified as positive
ROC (Receiver Operating Characteristic): plots the true positive
rate as a function of the false positive rate
CAC (Centralized AC)
CROC (Centralized ROC)
•
•
•
•
ROC curve, source: Wikipedia
Performance metrics 2.
Implemented software
•
•
Java language
command line
2 modules: fuser (12 classes), performance tester
(13 classes + 2 interfaces)
•
•
•
•
•
dedicated class for scored rankings: Ranking
common interface for all fusion methods: Fuser
common interface for all metrics: Metric
java fusiontester.Main [type] [r1path] [r1ms] [r2path] [r2ms] ...
java performancetester.Main [type] [rankingpath] [activespath]
Future plans
•
•
•
•
•
better handling of incomplete data
testing effects of noise
consider statistical significance of sources
...
(TDK)
References
1. Bolgár Bence Márton. Kernel fúziós módszerek alkalmazása a genomikai
kísérlettervezésben és adatelemzésben. 2012.
2. Fredrik Svensson, Anders Karlén, and Christian Sköld. Virtual Screening
Data Fusion Using Both Structure- and Ligand-based Methods. J. Chem.
Inf. Model. 2012, 52, 225−232.
3. S. Joshua Swamidass, Chloé-Agathe Azencott, Kenny Daily and Pierre
Baldi. A CROC stronger than ROC: measuring, visualizing and optimizing
early retrieval. Advance Access publication April 7, 2010.
4. Jean-François Truchon and Christopher I. Bayly. Evaluating Virtual
Screening Methods: Good and Bad Metrics for the “Early Recognition”
Problem. J. Chem. Inf. Model. 2007, 47, 488-508.