A Statistical Analysis of the Precision-Recall Graph

Download Report

Transcript A Statistical Analysis of the Precision-Recall Graph

A Statistical Analysis of the
Precision-Recall Graph
Ralf Herbrich
Microsoft Research
UK
Joint work with Hugo Zaragoza and Simon Hill
1
Overview





The Precision-Recall Graph
A Stability Analysis
Main Result
Discussion and Applications
Conclusions
2
Features of Ranking Learning





We cannot take differences of ranks.
We cannot ignore the order of ranks.
Point-wise loss functions do not capture the
ranking performance!
ROC or precision-recall curves do capture
the ranking performance.
We need generalisation error bounds for
ROC and precision-recall curves!
3
Precision and Recall

Given:


Ranking the sample:



Sample z=((x1,y1),...,(xm,ym)) 2 (X £ {0,1})m with
k positive yi together with a function f:X ! R.
Re-order the sample: f(x(1)) ¸ ¢¢¢ ¸ f(x(m))
Record the indices i1,…, ik of the positive y(j).
Precision pi and ri recall:
4
The Precision-Recall Graph
After reordering:
1
0.9
f(x(i))
0.8
Precision
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
Recall
0.8
1
5
Graph Summarisations
1
0.9
Break-Even point
0.8
Precision
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
Recall
0.8
1
6
A Stability Analysis: Questions
1.
2.


How much does A(f,z) change if we can
alter one sample (xi,yi)?
How much does A(f,¢) change if we can
alter z?
We will assume that the number of positive
examples, k, has to remain constant.
We can only alter xi, i.e. rotate one y(i).
7
Stability Analysis

Case 1: yi=0

Case 2: yi=1
8
Main Result
Theorem: For all probability measures, for all
®>1/m, for all f:X ! R, with probability at least
1-± over the IID draw of a training and test
sample both of size m, if both training sample
z and test sample z contain at least d®me
positive examples then
9
Proof
1.
2.
McDiarmid’s inequality: For any function
g:Zn ! R with stability c, for all probability
measures P with probability at least 1-±
over the IID draw of Z
Set n= 2m and call the two m-halfes Z1 and
Z2. Define gi (Z):=A(f,Zi). Then, by IID
10
Discussions




First bound which shows that asymptotically
(m!1) training and test set performance (in
terms of average precision) converge!
The effective sample size is only the number
of positive examples, in fact, only ®2m .
The proof can be generalised to arbitrary test
sample sizes.
The constants can be improved.
11
Applications


Cardinality bounds
Compression Bounds
Union bound:
(TREC 2002)


No VC bounds!
No Margin bounds!
12
Conclusions




Ranking learning requires to consider nonpoint-wise loss functions.
In order to study the complexity of algorithms
we need to have large deviation inequalities
for ranking performance measures.
McDiarmid’s inequality is a powerful tool.
Future work is focused on ROC curves.
13