Transcript error S (h)

Evaluating Hypothesis
자연언어처리연구실
장정호
개요
• Evaluating the accuracy of hypotheses is
fundamental to ML.
- to decide whether to use this hypothesis
- integral component of many learning
system
• Difficulty from limited set of data
- Bias in the estimate
- Variance in the estimate
1. Contents
• Methods for evaluating learned hypotheses
• Methods for comparing the accuracy of two
hypotheses
• Methods for comparing the accuracy of two
learning algorithms when limited set of data
is available
2. Estimating Hypothesis Accuracy
• Two Interests
1. Given a hypothesis h and a data sample,
what is the best estimate of the accuracy of h
over unseen data?
2. What is probable error in accuracy estimate?
2. Evaluating… (Cont’d)
• Two Definitions of Error
1. Sample Error
with respect to target function f and data sample S,
error (h) 1
 ( f (x), h(x))

n
S
xS
2. True Error
with respect to target function f and distribution D,
errorD (h) Pr[ f ( x) h( x)]
xD
How good an estimate of errorD(h) is provided by errorS(h)?
2. Evaluating… (Cont’d)
• Problems Causing Estimating Error
1. Bias : if S is training set, errorS(h) is optimistically
biased
estimation bias = E[errorS(h)] - errorD(h)
For unbiased estimate, h and S must be chosen
independently
2. Variance : Even with unbiased S, errorS(h) may vary
from errorD(h)
2. Evaluating… (Cont’d)
• Estimators
Experiment :
1. Choose sample S of size n according to distribution D
2. Measure errorS(h)
errorS(h) is a random variable
errorS(h) is an unbiased estimator for errorD(h)
Given observed errorS(h) what can we conclude about
errorD(h) ?
2. Evaluating… (Cont’d)
• Confidence Interval
if 1. S contains n examples, drawn independently of h and each other
2. n >= 30
then
with approximately N% probability, errorD(h) lies in interval
errorS (h) Zn errorS(h)(1nerrorS(h))
2. Evaluating… (Cont’d)
• Normal Distribution Approximates
Binomial Distribution
errorS(h) follows a Binomial distribution, with
error (h)  errorD (h)
S
 error
(h)
S

error (h)(1 error (h))
D
D
n
Approximate this by a Normal distribution with
error
 errorD (h)
 error

(h)
S
S
(h)
error (h)(1 error (h))
S
S
n
2. Evaluating… (Cont’d)
• More Correct Confidence Interval
if 1. S contains N examples, drawn independently of h and each other
2. N>= 30
then
with approximately 95% probability, errorS(h) lies in interval
error (h)  1.96
D
errorD(h)(1 errorD(h))
n
equivalently, errorS(h) lies in interval
error (h)  1.96
S
errorD(h)(1 errorD(h))
n
which is approximately
error (h)  1.96
S
errorS (h)(1 errorS (h))
n
2. Evaluating… (Cont’d)
• Two-sided and One-sided bounds
1. Two-sided
What is the probability that errorD(h) is between L and U?
2. One-sided
What is the probability that errorD(h) is at most U?
100(1-a)% confidence interval in Two-sided implies 100(1-a/2)% in
One-sided.
3. General Confidence Interval
• Consider a set of independent, identically distributed
random variables Y1…Yn, all governed by an arbitrary
probability distribution with mean  and variance 2.
Define sample mean,
 1n
Y  n Yi
i1
• Central Limit Theorem

As n, the distribution governing Y approaches a
Normal distribution, with mean  and variance 2 /n.
3. General Confidence Interval
(Cont’d)
1. Pick parameter p to estimate
errorD(h)
2. Choose an estimator
errorS(h)
3. Determine probability distribution that governs estimator
errorS(h) governed by Binomial distribution,
approximated by Normal distribution when n>=30
4. Find interval (L, U) such that N% of probability mass falls
in the interval
4. Difference in Error of
Two Hypothesis
• Assumption
- two hypothesis h1, h2.
- h1 is tested on sample S1 containing n1 random examples.
h2 is tested on sample S2 containing n2 ramdom examples.
• Object
- get difference between two true errors. where,
d = errorD(h1) - errorD(h2)
4. Difference in Error of
Two Hypothesis(Cont’d)
• Procedure
1. Choose an estimator for d
dˆ error (h )error (h )
D
D
1
2
2. Determine probability distribution that governs estimator
 dˆ  errorS1(h1)(1nerror(h1))  errorS 2 (h2 )(1nerrors2 (h2 ))
1
2
3. Find interval (L, U) such that N% of probability mass falls in the
interval
error(h1))  errorS 2 (h2 )(1errors2 (h2 ))
dˆ  z  errorS1(h1)(1
n
n
n
1
2
4. Difference in Error of
Two Hypothesis(Cont’d)
• Hypothesis Test
Ex)
size of S1, S2 is 100
error s1(h1)=0.30, errors2(h2) = 0.20
What is the probability that
errorD(h1) > errorD(h2)?
4. Difference in Error of
Two Hypothesis(Cont’d)
• Solution
1. The problem is equivalent to getting the probability of the following
dˆ d 0.10  dˆ  dˆ 0.10
2. From former expression,
 dˆ 0.061, so dˆ  dˆ 1.64 dˆ
3. Table of Normal distribution shows that associated confidence level for
two-sided interval is 90%, so for one-sided interval,
it is 95%
5. Comparing Two Learning
Algorithms
• What we’d like to estimate:
where L(S) is the hypothesis output by learner L using training set S
ESD[errorD (LA (S ))errorD (LB (S ))]
But, given limited data D0, what is a good estimator?
 Could partition D0 into training set S and test set T0, and measure
errorT0(LA(S0)) - errorT0(LB(S0))
 Even better, repeat this many times and average the results
5. Comparing Two Learning
Algorithms(Cont’d)
1. Partition data D0 into k disjoint test sets T1, T2, …, Tk of
equal size, where this size if at least 30.
2. For 1 <= i <=k, do
use Ti for the test set, and the remaining data for training
set Si
Si = {D0 - Ti},
hA= LA(Si), hB= LB(Si)
 i errorTi (hA )errorTi (hB )
3. Return the value i, where
  1  i
k
k i1
5. Comparing Two Learning
Algorithms(Cont’d)
4. Now, use paired t test on to obtain a confidence interval
The result is…
N% confidence interval estimate for  :
 tN ,k 1s
sd 
k
1 
(  )2
k (k 1) i1 i