A Statistical Analysis of the Precision
Download
Report
Transcript A Statistical Analysis of the Precision
A Statistical Analysis of the
Precision-Recall Graph
Ralf Herbrich, Hugo Zaragoza, Simon Hill.
Microsoft Research,
Cambridge University,
UK.
Microsoft Research
1
Overview
2-class ranking
Average-Precision
From points to curves
Generalisation bound
Discussion
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
2
“Search” cost-functions
Maximise the number of relevant documents
found in the top 10.
Maximise the number of relevant documents
at the top (e.g. weight inversely proportional
to rank)
Minimise the number of documents seen by
the user until he is satisfied.
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
3
Motivation
Why should 45 August, 2003 work for
document categorisation?
Why should any algorithm obtain good
generalisation average-precision?
How to devise algorithms to optimise rank
dependant loss-functions?
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
4
2-class ranking problem
X,Y
Mapping: X R
{y=1}
Relevancy:
P(y=1|x)
P(y=1|f(x))
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
5
Collection samples
A collection is a sample:
z= ((x1,y1),...,(xm,ym)) (X x {0,1})m
where:
y = 1 if the document x is relevant to a particular
topic,
z is drawn from the (unknown) distribution πXY
let k denote the number of positive examples
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
6
Ranking the collection
We are given a scoring function f :XR
This function imposes an order in the
collection:
(x(1) ,…, x(m)) such that : f(x(1)) > … > f(x(m))
Hits (i1,…, ik) are the indices of the positive y(j).
f(x(i))
y(i) = 1
ij = 1
1
2
0
1
4
0
0
1
7
0
0
0
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
7
Classification setting
If we threshold the function f, we obtain a
classification:
f(x(i))
t
Recall:
1 i
ri y(j)
k j 1
P ( f ( x) t | y 1)
Precision:
1 i
pi y(j)
i j 1
P( y 1 | f ( x) t )
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
8
Precision .vs. PGC
k m k
k mk
PGC
k 1 P( f
t | y 0)
p 1
m k P ( f ( x) t | y 1)
PRECISION
(PGC
x)
1
PRECISION
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
9
The Precision-Recall Graph
After reordering:
f(x(i))
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
10
Graph Summarisations
1
0.9
Break-Even point
0.8
Precision
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
0.8
1
Recall
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
11
Precision-Recall Example
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
12
overfitting?
Average Precision (TEST SET)
1
0.8
0.6
0.4
0.8
0.7
0.2
0.6
0.96
0
0.4
0.6
0.8
0.97
0.98
0.99
1
1
Average Precision (TAIN SET)
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
13
Overview
2-class ranking
Average-Precision
From points to curves
Generalisation bound
Discussion
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
14
From point to curve bounds
There exist SVM margin-bounds [Joachims
2000] for precision and recall.
They only apply to a single (unknown a priori)
point of the curve!
Precision
Recall
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
15
Max-Min precision-recall
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
16
Max-Min precision-recall (2)
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
17
Features of Ranking Learning
We cannot take differences of ranks.
We cannot ignore the order of ranks.
Point-wise loss functions do not capture the
ranking performance!
ROC or precision-recall curves do capture
the ranking performance.
We need generalisation error bounds for
ROC and precision-recall curves
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
18
Generalisation and Avg.Prec.
How far can the observed Avg.Prec. A(f,z)
be from the expected average A(f) ?
A( f ) E z ~
A( f , z )
XY
A( f , z ) A( f )
XY
Pz ~ m
?
How far can train and test Avg.Prec.?
A( f , z ) A( f , z )
XY
Pz , z ~ m
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
?
19
Approach
1.
McDiarmid’s inequality: For any function
g:ZnR with stability c, for all probability
measures P with probability at least 1-δ
over the IID draw of Z
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
20
Approach (cont.)
2.
Set n= 2m and call the two m-halves Z1 and
Z 2.
Define gi (Z):=A(f,Zi). Then, by IID :
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
21
Bounding A(f,z) - A(f,zi)
1.
How much does A(f,z) change if we can
alter one sample (xi,yi)?
We need to fix the number of positive
examples in order to answer this question!
e.g. if k=1, the change can be from 0 to 1.
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
22
Stability Analysis
Case 1: yi=0
Case 2: yi=1
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
23
Main Result
Theorem: For all probability measures, for all
f:XR, with probability at least 1- δ over the
IID draw of a training and test sample both of
size m, if both training sample z and test
sample z contain at least αm positive
examples for all α(0,1), then:
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
24
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
25
Positive results
First bound which shows that asymptotically
training and test set performance (in terms of
average precision) converge!
The effective sample size is only the number
of positive examples.
The proof can be generalised to arbitrary test
sample sizes.
The constants can be improved.
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
26
Open questions
How can we let k change, so as to
investigate:
Pz (| A( f , z ) A( f , z ' ) | )
What algorithms could be used to directly
maximise A(f,z) ?
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
27
Conclusions
Many problems require ranking objects in
some degree.
Ranking learning requires to consider nonpoint-wise loss functions.
In order to study the complexity of algorithms
we need to have large deviation inequalities
for ranking performance measures.
Hugo Zaragoza. AMS-IMS-SIAM Joint Summer Research Conference on Machine Learning, Statistics, and Discovery. June 2003.
28