Computer Vision

Download Report

Transcript Computer Vision

Computer Vision
Lecture 8
Performance Evaluation
This Lecture
• Estimates of random variables and confidence
intervals
– Probability estimation
– Parameter estimation
• Receiver operating characteristic
– Theory
– Experiment
• Training and testing
– Bias due to re-testing
– Leave-one-out
Computer Vision, Lecture 8
Oleh Tretiak © 2005
Slide 1
Experiment
• We develop a vision system that detects
whether it is safe to cross the street.
• We test it 100 times and it works every
time. What can we say about the
probability of success?
Computer Vision, Lecture 8
Oleh Tretiak © 2005
Slide 2
Repeated Tosses of a Fair Coin
• We toss a coin 10 times
– How many heads?
– Numerical experiment
• Function RAND() in Excel produces random
numbers uniformly distibuted from 0 to 1
• Expression IF(RAND()>0.5,1,0) will produce 0
and 1 50% of the time
• Number of 1’s in 10 trials: 7, 6, 6, 6, 7
Computer Vision, Lecture 8
Oleh Tretiak © 2005
Slide 3
Theory
• The number of 1’s in n trials is given by
P(k, n, p) =C(k, n) pk(1-p)n-k
where p is the probability of a 1 and
C(k, n) = n!/(k!(n-k)!) is the binomial
coefficient.
• P(5, 10, 0.5) = 0.246
• P(6, 10, 0.5) = 0.205
Computer Vision, Lecture 8
Oleh Tretiak © 2005
Slide 4
P(0,100,p)
1
0 .1
0
0 .0 5
0 .1
0 .1 5
0 .2
0 .2 5
0 .0 1
0 .0 0 1
p
0 .0 0 0 1
1E -05
1E -06
1E -07
1E -08
1E -09
1E -10
• We can say with 99% confidence that the
probability of detection is between 0 and 0.05
• 99.9% confidence between 0 and 0.07
Computer Vision, Lecture 8
Oleh Tretiak © 2005
Slide 5
Parameter Estimation
• We perform an experiment with a binary
outcome n times and obtain k successes.
What can we say about the probability of
error?
• The expected number of successes is np.
The standard deviation of the number of
successes is [np(1-p)]0.5
• The usual estimate of the probability is p^=k/n
• What range of values could have produced
this result?
Computer Vision, Lecture 8
Oleh Tretiak © 2005
Slide 6
Cumulative Probability Plot
P(k<x;p = 0.45, 100 trials)
1 .2
1
P
0 .8
0 .6
0 .4
0 .2
0
30
35
40
45
50
55
60
x
• P(35)=0.027, P(54) = 0.971. The interval [0.35, 0.54] has a
probability of 0.944.
• We call this the 95% confidence interval for estimating the
probability of success
Computer Vision, Lecture 8
Oleh Tretiak © 2005
Slide 7
Experimental Design
• We need to perform an experiment to test the
reliability of a collection of parts. The test is
destructive: we want to test as few parts as possible.
• The test consists of examining n parts, and rejecting
the collection if k fail. The test is designed using the
binomial formula.
• The design is complicated because
– We need to specify the type of test to use and what
constitutes a success/failure, how many bad parts are we
willing to accept
– We need to specify how to select the parts to be tested
– There are two types of errors: we need to decide acceptable
levels for both
Computer Vision, Lecture 8
Oleh Tretiak © 2005
Slide 8
Two-Category Test
•
We perform a test on a patient, and get a measurement x. There are
two possibilities: the patient is healthy or sick. The density functions for
the two possibilities are shown below. We chose a threshold t, and
decide that the patient is sick if the value x is higher than t. The two
probabilities are called P(False Positive) and P(True Positive)
0 .6
0 .5
0 .4
0 .3
0 .2
0 .1
0
-4
-2
0
2
4
6
8
- 0 .1
t
Computer Vision, Lecture 8
Oleh Tretiak © 2005
Slide 9
ROC Plot
• As the threshold is varied, both P(FP) and P(TP)
change. This curve is called the Receiver Operating
Characteristic (ROC).
ROC
0.6
1.2
0.5
1
P(True Po sitive)
0.4
0.3
0.2
0.8
0.6
0.4
0.1
0.2
0
-4
-2
0
2
-0.1
Computer Vision, Lecture 8
4
6
8
0
0
0.2
0.4
0.6
0.8
1
1.2
P(False Positive)
Oleh Tretiak © 2005
Slide 10
Evaluating the ROC
ROC
1.2
1
P(True Po sitive)
• The ROC lies in the unit
square. Ideally, the curve
should go up from (0, 0) to (0,
1), and then horizontally to
(1,1).
• Rather than evaluating one
combination of the two
probabilities, it is desirable to
measure the whole curve.
• A single measure of quality is
given by Az, the area under the
operating curve.
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
1.2
P(False Positive)
– On the right the red curve has
Az = 1, the blue curve has Az =
0.89
Computer Vision, Lecture 8
Oleh Tretiak © 2005
Slide 11
Experimental Results
• In any experiment with actual data, the
performance, such as the error
probability or Az should be treated as a
random variable.
• The evaluation results should consist
not only of the performance estimate
but also a confidence interval.
Computer Vision, Lecture 8
Oleh Tretiak © 2005
Slide 12
Performance of Classifiers
• In the last lecture, we considered the
performance of classifiers, such as neural
networks. These elements contain a number
of parameters whose values must be chosen
to obtain the correct operation.
– For example, if the input to the classifier is onedimensional, and the classifier uses a threshold to
discriminate between two categories, we need to
find the threshold from the training data.
Computer Vision, Lecture 8
Oleh Tretiak © 2005
Slide 13
Training and Evaluation
• The quality of performance depends on the
accuracy with which we estimate the
parameters. Suppose we are measuring the
error rate. We will represent the error rate
obtainable with the best values of the
classifier parameters by Pmin
– To measure this probability we need to have a
very large collection of samples, so the confidence
interval of the estimate is very small
– If we do not use the right values of parameters,
then the probability will be greater than Pmin
Computer Vision, Lecture 8
Oleh Tretiak © 2005
Slide 14
Training Error
• We find the classifier parameters by examining some
training samples and adjusting the classifier to
correctly identify these data.
– With a finite sample, the values we get will be a random
variable. Since these will differ from the best values, the
performance with these parameters will be worse than with
the best parameters. To avoid this we need a large training
set.
– If we test the classifier with the same sample as that used for
training, then the performance may seem to be better than
the true performance. This is an optimistic bias, and can be
very large.
Computer Vision, Lecture 8
Oleh Tretiak © 2005
Slide 15
Independent Testing
• To avoid the bias of testing on the same
data as training, it is necessary to divide
the available data into two groups: A
training set and a test set. One set is
used for training and the other for
testing. If the total number of available
data is N, then one can use N1 data
items for training and N2 items for
testing, with N = N1 + N2.
Computer Vision, Lecture 8
Oleh Tretiak © 2005
Slide 16
Testing Dilemma
• N = N1 + N 2
• To obtain accurate parameter values, N1
should be large
• To obtain accurate performance values,
N2 should be large
Computer Vision, Lecture 8
Oleh Tretiak © 2005
Slide 17
Leave-One-Out
• One way of ameliorating this problem is to
use the leave-one-out design
– We exclude one data item, and train the system
on N-1 items
– Test the system on the remaining item
– Repeat this for each item in the set
• This produces training sets with N1 =
N-1 elements and each test is independent of
the training set.
– The tests are not independent, so that confidence
interval estimation is complicated.
Computer Vision, Lecture 8
Oleh Tretiak © 2005
Slide 18
Summary
• Complex classifiers can be designed to correctly
identify large data sets. However, they may not
perform well on data they have not encountered
– This is called a generalization problem
• To obtain a valid evaluation of performance, one must
use independent training and test data.
• The complexity of a classifier depends not only on
the problem (data distribution) but also on the size of
the training set.
– A classifier with very many parameters may perform poorly
when trained with a small set because the parameters are
not estimated accurately.
Computer Vision, Lecture 8
Oleh Tretiak © 2005
Slide 19