Empirical Methods in Computer Science

Download Report

Transcript Empirical Methods in Computer Science

Statistical Methods in
Computer Science
Hypothesis Testing II:
Single-Factor Experiments
Ido Dagan
2
Single-Factor Experiments


A generalization of treatment experiments
Determine effect of independent variable values (nominal)

Effect: On the dependent variable
treatment1
treatment2
control


Ind1 & Ex1 & Ex2 & .... & Exn ==> Dep1
Ind2 & Ex1 & Ex2 & .... & Exn ==> Dep2
Ex1 & Ex2 & .... & Exn ==> Dep3
Compare performance of algorithm A to B to C ....
Control condition: Optional (e.g., to establish baseline)
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
3
Single-Factor Experiments


An generalization of treatment experiments
Determine effect of independent variable values (nominal)

Effect: On the dependent variable
treatment1
treatment2
control


Ind1 & Ex1 & Ex2 & .... & Exn ==> Dep1
Ind2 & Ex1 & Ex2 & .... & Exn ==> Dep2
Ex1 & Ex2 & .... & Exn ==> Dep3
Values of independent
variable
Values of dependent variable
Compare performance of algorithm A to B to C ....
Control condition: Optional (e.g., to establish baseline)
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
Single-Factor Experiments:
Definitions


The independent variable is called the factor
Its values (being tested) are called levels
Our goal:
 Determine whether there is an effect of levels
 Null hypothesis: There is no effect
 Alternative hypothesis: At least one level causes an effect

Tool: One-way ANOVA

A simple special case of general Analysis of Variance
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
4
The case for Single-factor ANOVA
(one-way ANOVA)

We have k samples (k levels of the factor)


5
Each with its own sample mean, sample std. deviation for the
dependent variable value
We want to determine whether one (at least) is different
treatment1
…
treatment2
control
Ind1 & Ex1 & Ex2 & .... & Exn ==> Dep1
Indk & Ex1 & Ex2 & .... & Exn ==> Depk
Ex1 & Ex2 & .... & Exn ==> Dep3
Values of independent
variable = levels of the factor
Values of dependent variable
Cannot use the tests we learned: Why?
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
The case for Single-factor ANOVA
(one-way ANOVA)

We have k samples (k levels of the factor)


Each with its own sample mean, sample std. deviation
We want to determine whether one (at least) is different
Level
( sam ple)
1
2
3
4


S. m ean S. st dev.
Mi
Si
4.24
0.91
3.75
1.38
2.85
1.38
2.63
1.41
H0: M1=M2=M3=M4
H1: There exist i,j such that Mi <> Mj
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
N
29
120
59
59
6
The case for Single-factor ANOVA
(one-way ANOVA)

We have k samples (k levels of the factor)


Each with its own sample mean, sample std. deviation
We want to determine whether one (at least) is different
Level
( sam ple)
1
2
3
4


S. m ean S. st dev.
Mi
Si
4.24
0.91
3.75
1.38
2.85
1.38
2.63
1.41
N
29
120
59
59
Why not use t-test
H0: M1=M2=M3=M4
to compare every
H1: There exist i,j such that Mi <> Mj
M, M?
i
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
j
7
8
Multiple paired comparisons

Let ac be the probability of an error in a single comparison




alpha = the probability of incorrectly rejecting null hypothesis
1-ac: probability of making no error in a single comparison
(1-ac)m: probability of no error in m comparisons
(experiment)
ae = 1-(1-ac)m: probability of an error in the experiment

Under assumption of independent comparisons
ae quickly becomes large as m increases
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
9
Example

Suppose we want to contrast 15 levels of the factor


Total number of pairwise comparisons (m) : 105



15 groups, k=15
15 X (15-1) / 2 = 105
Suppose ac = 0.05
Then ae = 1-(1-ac)m = 1-(1-0.05)105 = 0.9954
We are very likely to make a type I error!
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
10
Possible solutions?

Reduce ac until overall ae level is 0.05 (or as needed)


Risk: comparison alpha target may become unobtainable
Ignore experiment null hypothesis, focus on comparisons



Carry out m comparisons
# of errors in m experiments: m X ac
e.g., m=105, ac=0.05, # of errors = 5.25. But which?
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
11
One-way ANOVA

A method for testing the experiment null hypothesis

H0: all levels' sample means are equal to each other
Key idea:
 Estimate a variance B under the assumption H0 is true
 Estimate a “real” variance W (regardless of H0)
 Use F-test to test hypothesis that B=W

Assumes variance of all groups is the same
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
12
Some preliminaries



Let xi,j be the jth element in sample i
Let Mi be the sample mean of sample i
Let Vi be the sample variance of sample i
For example:
x1,2
x3,4
Mi
Vi
Empirical Methods in Computer Science
Class 1
14.9
15.2
17.9
15.6
10.7
14.86
6.8
Class 2
11.1
9.5
10.9
11.7
11.8
11
0.85
© 2006-now Gal Kaminka
Class 3
5.7
6.6
6.7
6.8
6.9
6.54
0.23
13
Some preliminaries





Let xi,j be the jth element in sample i
Let Mi be the sample mean of sample i
Let Vi be the sample variance of sample i
Let M be the grand sample mean (all elements, all samples)
Let V be the grand sample variance
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
14
The variance contributing to a value

Every element xi,j can be re-written as:
xi,j = M + ei,j
where ei,j is some error component

We can focus on the error component
ei,j = xi,j – M
which we will rewrite as:
ei,j = (xi,j - Mi ) + (Mi - M)
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
15
Within-group and between-group

The re-written form of the error component has two parts
ei,j = (xi,j - Mi ) + (Mi - M)



Within-group component: variance w.r.t group mean
Between-group component: variance w.r.t grand mean
For example, in the table:


x1,1 = 14.9,
M1 = 14.86,
M = 10.8
e1,1 = (14.9-14.86) + (14.86 – 10.8) = 0.04 + 4.06 = 4.1
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
16
Within-group and between-group

The re-written form of the error component has two parts
ei,j = (xi,j - Mi ) + (Mi - M)



For example, in the table:



Within-group component: variance w.r.t group mean
Between-group component: variance w.r.t grand mean
x1,1 = 14.9, M1 = 14.86, M = 10.8
e1,1 = (14.9-14.86) + (14.86 – 10.8) = 0.04 + 4.06 = 4.1
Note within-group and between-group components:

Most of the error (variance) is due to the between group!
Can we use this in more general fashion?
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
17
No within-group variance
M
10.67
V
14.52
Mi
Vi
Class 1
15
15
15
15
15
15
0
Class 2
11
11
11
11
11
11
0
Class 3
6
6
6
6
6
6
0
No variance within group, in any element
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
18
No between-group variance
M
15
V
24.86
Mi
Vi
Class 1
17
26
9
11
12
15
46.5
Class 2
11
13
18
18
15
15
9.5
Class 3
22
14
12
8
19
15
31
No variance between groups, in any group
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
19
Comparing within-group and
between-groups components

The error component of a single element is:
ei,j =(xi,j - M) = (xi,j - Mi ) + (Mi - M)

Let us relate this to the sample and grand sums-of-squares

It can be shown that:
 x
i, j  M  =  xi, j  M i  +  M i  M 
i

2
j
2
i
j
2
i
Let us rewrite this as
SStotal = SSwithin + SSbetween
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
j
20
From Sums of Squares (SS) to variances

We know
SStotal = SSwithin + SSbetween

... and convert to Mean Squares (as variance estimates):
SS within
MS within =
=
df within
 x
i, j  M i 
i
 x
i, j  M i 
2
j
I
 N i  1
=
i
j
NI
2
`
k =1
MS between =
SSbetween
=
df between
Empirical Methods in Computer Science
2
2


M

M


N
M

M

i

i
i
i
j
I 1
=
i
© 2006-now Gal Kaminka
I 1
21
From Sums of Squares (SS) to variances

We know

... and convert to variances:
MS within =
SStotal = SSwithin + SSbetween
SS within
=
df within
 x
i, j  M i 
i
 x
i, j  M i 
2
j
I
 N i  1
=
i
2
j
NI
k =1
Degrees of
freedom
SSbetween
MS between =
=
df between
Empirical Methods in Computer Science
2
2


M

M


N
M

M

i
 i i
i
j
I 1
=
i
I 1
© 2006-now Gal Kaminka
22
From Sums of Squares (SS) to variances


We know
SStotal = SSwithin + SSbetween
... and convert to variances:
SS within
MS within =
=
df within
 x
i, j  M i 
i
j
I
 N i  1
 x
i, j  M i 
2
=
i
2
j
NI
k =1
# of levels
(samples)
MS between =
SSbetween
=
df between
Empirical Methods in Computer Science
2
2


M

M


N
M

M

i

i
i
i
j
I 1
=
i
© 2006-now Gal Kaminka
I 1
23
From Sums of Squares (SS) to variances

We know
SStotal = SSwithin + SSbetween

... and convert to variances:
SS within
MS within =
=
df within
 x
i, j  M i 
i
 x
i, j  M i 
2
j
I
 N i  1
=
i
j
NI
2
`
k =1
MS between =
SSbetween
=
df between
Empirical Methods in Computer Science
2
2


M

M


N
M

M

i

i
i
i
j
I 1
=
i
© 2006-now Gal Kaminka
I 1
24
Determining final alpha level

MSwithin is an estimate of the (inherent) population variance

Which does not depend on the null hypothesis (M1=M2=... MI)


MSbetween estimates the population variance + the treatment effect

It does depend on the null hypothesis




Intuition: It’s similar to an estimate for the variance of the samples means, where each
component is multiplied by Ni
 Recall: sample mean variance = N · population variance
If the null hypothesis is true – the two values estimate the inherent
variance, and should be equal within the sampling variation
So now we have two variance estimates for testing
Use F-test



Intuition: It’s an “average” of variances in the individual groups
F = Msbetween / MSwithin
Compare to F-distribution with dfbetween, dfwithin
Determine alpha level (significance)
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
25
Example
M
10.8
V
14.64
Mi
Vi
Class 1
14.9
15.2
17.9
15.6
10.7
14.86
6.8
Class 2
11.1
9.5
10.9
11.7
11.8
11
0.85
Class 3
5.7
6.6
6.7
6.8
6.9
6.54
0.23
SSbetween = 514.86  10.8 + 511  10.8 + 56.54  10.8 = 173.3
2
2
SSbetween 173.3 173.3
MS between =
=
=
= 86.7
df between 3  1
2
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
2
26
Example
Class 1
14.9
15.2
17.9
15.6
10.7
14.86
6.8
M
10.8
V
14.64
Mi
Vi
Class 2
11.1
9.5
10.9
11.7
11.8
11
0.85
Class 3
5.7
6.6
6.7
6.8
6.9
6.54
0.23
SSwithin = 14.9  14.86 + ... + 10.7  14.86
2
2
+ 11.1  11 + .... + 11.8  11 + ... + 6.9  6.54  = 31.5
2
MS within =
2
2
SSwithin
31.5
31.5
=
=
= 2.6
df within 15  3 12
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
27
Example
M
10.8
V
14.64
Mi
Vi
Class 1
14.9
15.2
17.9
15.6
10.7
14.86
6.8
Class 2
11.1
9.5
10.9
11.7
11.8
11
0.85
M S b e t w e e n 8 6 .7
F=
=
= 3 2 .9 7
M S w it h in
2 .6
Empirical Methods in Computer Science
Class 3
5.7
6.6
6.7
6.8
6.9
6.54
0.23
Check F distribution(2,12):
Significant!
© 2006-now Gal Kaminka
28
Reading the results from statistics software


You can use a statistics software to run one-way ANOVA
It will give out something like this:
Source
between
within
total
df
2
14
16
SS
173.3
31.5
204.9
MS F
86.7 32.97
2.6
p
p<0.001
You should have no problem reading this, now.
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
29
Summary

Treatment and single-factor experiments




Multiple comparisons: A problem for experiment hypotheses
Run one-way ANOVA instead


Independent variable: categorical
Dependent variable: “numerical” (ratio/interval)
Assumes samples are normal, have equal variances
If significant, run additional tests for details:




Tukey's procedure (T method)
LSD
Scheffe
...
Empirical Methods in Computer Science
© 2006-now Gal Kaminka