A Statistical Analysis of the Precision

Download Report

Transcript A Statistical Analysis of the Precision

A Statistical Analysis of the
Precision-Recall Graph
Ralf Herbrich, Hugo Zaragoza, Simon Hill.
Microsoft Research,
Cambridge University,
UK.
Microsoft Research
1
Overview

2-class ranking

Average-Precision

From points to curves

Generalisation bound

Discussion
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
2
“Search” cost-functions

Maximise the number of relevant documents
found in the top 10.

Maximise the number of relevant documents
at the top (e.g. weight inversely proportional
to rank)

Minimise the number of documents seen by
the user until he is satisfied.
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
3
Motivation

Why should 45 August, 2003 work for
document categorisation?

Why should any algorithm obtain good
generalisation average-precision?

How to devise algorithms to optimise rank
dependant loss-functions?
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
4
2-class ranking problem
X,Y
Mapping: X  R
{y=1}
Relevancy:
P(y=1|x)

P(y=1|f(x))
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
5
Collection samples

A collection is a sample:
z= ((x1,y1),...,(xm,ym))  (X x {0,1})m

where:
 y = 1 if the document x is relevant to a particular
topic,
 z is drawn from the (unknown) distribution πXY

let k denote the number of positive examples
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
6
Ranking the collection

We are given a scoring function f :XR

This function imposes an order in the
collection:

(x(1) ,…, x(m)) such that : f(x(1)) > … > f(x(m))

Hits (i1,…, ik) are the indices of the positive y(j).
f(x(i))
y(i) = 1
ij = 1
1
2
0
1
4
0
0
1
7
0
0
0
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
7
Classification setting

If we threshold the function f, we obtain a
classification:
f(x(i))


t
Recall:
1 i
ri   y(j)
k j 1
 P ( f ( x)  t | y  1)
Precision:
1 i
pi   y(j)
i j 1
 P( y  1 | f ( x)  t )
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
8
Precision .vs. PGC
k  m  k
k  mk
PGC
k 1  P( f
t | y  0) 

p  1 

 m  k P ( f ( x)  t | y  1) 
PRECISION
(PGC
x) 
1
PRECISION
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
9
The Precision-Recall Graph
After reordering:
f(x(i))
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
10
Graph Summarisations
1
0.9
Break-Even point
0.8
Precision
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
0.8
1
Recall
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
11
Precision-Recall Example
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
12
overfitting?
Average Precision (TEST SET)
1
0.8
0.6
0.4
0.8
0.7
0.2
0.6
0.96
0
0.4
0.6
0.8
0.97
0.98
0.99
1
1
Average Precision (TAIN SET)
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
13
Overview

2-class ranking

Average-Precision

From points to curves

Generalisation bound

Discussion
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
14
From point to curve bounds
There exist SVM margin-bounds [Joachims
2000] for precision and recall.

They only apply to a single (unknown a priori)
point of the curve!
Precision

Recall
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
15
Max-Min precision-recall
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
16
Max-Min precision-recall (2)
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
17
Features of Ranking Learning

We cannot take differences of ranks.

We cannot ignore the order of ranks.

Point-wise loss functions do not capture the
ranking performance!

ROC or precision-recall curves do capture
the ranking performance.

We need generalisation error bounds for
ROC and precision-recall curves
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
18
Generalisation and Avg.Prec.

How far can the observed Avg.Prec. A(f,z)
be from the expected average A(f) ?
A( f )  E z ~
 A( f , z )
XY

A( f , z )  A( f )   
XY
Pz ~ m

?
How far can train and test Avg.Prec.?

A( f , z )  A( f , z )   
XY
Pz , z ~ m
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
?
19
Approach
1.
McDiarmid’s inequality: For any function
g:ZnR with stability c, for all probability
measures P with probability at least 1-δ
over the IID draw of Z
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
20
Approach (cont.)
2.
Set n= 2m and call the two m-halves Z1 and
Z 2.
Define gi (Z):=A(f,Zi). Then, by IID :
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
21
Bounding A(f,z) - A(f,zi)
1.
How much does A(f,z) change if we can
alter one sample (xi,yi)?

We need to fix the number of positive
examples in order to answer this question!

e.g. if k=1, the change can be from 0 to 1.
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
22
Stability Analysis

Case 1: yi=0

Case 2: yi=1
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
23
Main Result
Theorem: For all probability measures, for all
f:XR, with probability at least 1- δ over the
IID draw of a training and test sample both of
size m, if both training sample z and test
sample z contain at least αm positive
examples for all α(0,1), then:
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
24
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
25
Positive results

First bound which shows that asymptotically
training and test set performance (in terms of
average precision) converge!

The effective sample size is only the number
of positive examples.

The proof can be generalised to arbitrary test
sample sizes.

The constants can be improved.
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
26
Open questions

How can we let k change, so as to
investigate:
Pz (| A( f , z )  A( f , z ' ) |  )

What algorithms could be used to directly
maximise A(f,z) ?
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
27
Conclusions

Many problems require ranking objects in
some degree.

Ranking learning requires to consider nonpoint-wise loss functions.

In order to study the complexity of algorithms
we need to have large deviation inequalities
for ranking performance measures.
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
28