Transcript Document

Comparing Two Alternatives
Use confidence intervals for



Before-and-after comparisons
Noncorresponding measurements
Copyright 2004 David J. Lilja
1
Comparing more than two
alternatives

ANOVA


Analysis of Variance
Partitions total variation in a set of
measurements into


Variation due to real differences in alternatives
Variation due to errors
Copyright 2004 David J. Lilja
2
Comparing Two Alternatives
1. Before-and-after
Did a change to the
system have a
statistically significant
impact on performance?
2. Non-corresponding
measurements
Is there a statistically
significant difference
between two different
systems?
Copyright 2004 David J. Lilja
3
Before-and-After Comparison

Assumptions


Before-and-after measurements are not
independent
Variances in two sets of measurements may not
be equal
→ Measurements are related


Form obvious corresponding pairs
Find mean of differences
Copyright 2004 David J. Lilja
4
Before-and-After Comparison
bi  beforemeasurement
ai  aftermeasurement
d i  ai  bi
d  mean valueof d i
sd  standarddeviationof d i
(c1 , c2 )  d  t1 / 2;n 1
Copyright 2004 David J. Lilja
sd
n
5
Before-and-After Comparison
Measurement
(i)
Before
(bi)
After
(ai)
Difference
(di = bi – ai)
1
85
86
-1
2
83
88
-5
3
94
90
4
4
90
95
-5
5
88
91
-3
6
87
83
4
Copyright 2004 David J. Lilja
6
Before-and-After Comparison
Mean of differences  d  1
Standard deviation sd  4.15


From mean of differences, appears that change
reduced performance.
However, standard deviation is large.
Copyright 2004 David J. Lilja
7
95% Confidence Interval for Mean
of Differences
t1 / 2;n1  t0.975;5  2.571
n
…
5
6
…
∞
0.90
…
1.476
1.440
…
1.282
a
0.95
…
2.015
1.943
…
1.645
Copyright 2004 David J. Lilja
0.975
…
2.571
2.447
…
1.960
8
95% Confidence Interval for Mean
of Differences
sd
c1, 2  d  t1 / 2;n 1
n
t1 / 2;n 1  t0.975;5  2.571
c1, 2
c1, 2
 4.15 
 1  2.571

 6 
 [5.36,3.36]
Copyright 2004 David J. Lilja
9
95% Confidence Interval for Mean
of Differences
c1,2 = [-5.36, 3.36]
 Interval includes 0
→ With 95% confidence, there is no statistically
significant difference between the two
systems.

Copyright 2004 David J. Lilja
10
Noncorresponding
Measurements




No direct correspondence
between pairs of
measurements
Unpaired observations
n1 measurements of system 1
n2 measurements of system 2
Copyright 2004 David J. Lilja
11
Confidence Interval for Difference
of Means
1.
2.
3.
4.
5.
Compute means
Compute difference of means
Compute standard deviation of difference of
means
Find confidence interval for this difference
No statistically significant difference
between systems if interval includes 0
Copyright 2004 David J. Lilja
12
Confidence Interval for Difference
of Means
Differenceof means:
x  x1  x2
Combinedstandarddeviation:
sx 
2
1
2
2
s
s

n1 n2
Copyright 2004 David J. Lilja
13
Why Add Standard
Deviations?
x1 – x2
s1
s2
x1
x2
Copyright 2004 David J. Lilja
14
Number of Degrees of
Freedom
Not simply ndf  n1  n2  2
 2
 1


 1





2 
2
2
s
s

n n
ndf 
s
2
1
 
2
2
2
2

n1
s n2

n1  1
n2  1
Copyright 2004 David J. Lilja
2
15
Example – Noncorresponding
Measurements
n1  12 measurements
n2  7 measurements
x1  1243s
x2  1085s
s1  38.5
s2  54.0
Copyright 2004 David J. Lilja
16
Example (cont.)
x  x1  x2  1243 1085 158
38.52 542
 23.24

sx 
7
12
ndf 






2 2





2
38.5 54

7
12
38.5
 
2
542 7
12

7 1
12  1
2
Copyright 2004 David J. Lilja

2
 9.62  10
17
Example (cont.): 90% CI
c1, 2  x  t1 / 2;ndf s x
t1 / 2;ndf  t0.95;10  1.813
c1, 2  158 1.81323.24
c1, 2  [116,200]
Copyright 2004 David J. Lilja
18
A Special Case

If n1 <≈ 30 or n2 <≈ 30


And errors are normally distributed
And s1 = s2 (standard deviations are equal)
OR

If n1 = n2



And errors are normally distributed
Even if s1 is not equal s2
Then special case applies…
Copyright 2004 David J. Lilja
19
A Special Case
1 1

n1 n2
(c1 , c2 )  x  t1 / 2;ndf s p
ndf  n1  n2  2
sp 


s (n1  1)  s (n2  1)
n1  n2  2
2
1
2
2
Typically produces tighter confidence interval
Sometimes useful after obtaining additional
measurements to tease out small differences
Copyright 2004 David J. Lilja
20
Comparing Proportions
m1  # eventsof interestin system1
n1  total# eventsin system1
m2  # eventsof interestin system2
n2  total# eventsin system2
Copyright 2004 David J. Lilja
21
Comparing Proportions

Since it is a binomial distribution
mi
Mean  pi 
ni
pi (1  pi )
Variance 
ni
Copyright 2004 David J. Lilja
22
Comparing Proportions
p  ( p1  p2 )
sp 
p1 (1  p1 ) p2 (1  p2 )

n1
n2
(c1 , c2 )  p  z1 / 2 s p
Copyright 2004 David J. Lilja
23
OS Example (cont)

Initial operating system




Upgrade OS




n1 = 1,300,203 interrupts (3.5 hours)
m1 = 142,892 interrupts occurred in OS code
p1 = 0.1099, or 11% of time executing in OS
n2 = 999,382
m2 = 84,876
p2 = 0.0849, or 8.5% of time executing in OS
Statistically significant improvement?
Copyright 2004 David J. Lilja
24
OS Example (cont)



p = p1 – p2 = 0.0250
sp = 0.0003911
90% confidence interval


(0.0242, 0.0257)
Statistically significant difference?
Copyright 2004 David J. Lilja
25
Important Points

Use confidence intervals to determine if there
are statistically significant differences

Before-and-after comparisons


Noncorresponding measurements



Find interval for mean of differences
Find interval for difference of means
Proportions
If interval includes zero
→ No statistically significant difference
Copyright 2004 David J. Lilja
26
Comparing More Than Two
Alternatives

Naïve approach

Compare confidence intervals
Copyright 2004 David J. Lilja
27
One-Factor Analysis of
Variance (ANOVA)
Very general technique



Also called





Look at total variation in a set of measurements
Divide into meaningful components
One-way classification
One-factor experimental design
Introduce basic concept with one-factor
ANOVA
Generalize later with design of experiments
Copyright 2004 David J. Lilja
28
One-Factor Analysis of
Variance (ANOVA)
Separates total variation observed in a set
of measurements into:

Variation within one system
1.

Variation between systems
2.


Due to random measurement errors
Due to real differences + random error
Is variation(2) statistically > variation(1)?
Copyright 2004 David J. Lilja
29
ANOVA



Make n measurements of k alternatives
yij = ith measurment on jth alternative
Assumes errors are:


Independent
Gaussian (normal)
Copyright 2004 David J. Lilja
30
Measurements for All Alternatives
Alternatives
Measure
ments
1
2
…
j
…
k
1
y11
y12
…
y1j
…
yk1
2
y21
y22
…
y2j
…
y2k
…
…
…
…
…
…
…
i
yi1
yi2
…
yij
…
yik
…
…
…
…
…
…
…
n
yn1
yn2
…
ynj
…
ynk
Col mean
y.1
y.2
…
y.j
…
y.k
Effect
α1
α2
…
αj
…
αk
Copyright 2004 David J. Lilja
31
Column Means

Column means are average values of all
measurements within a single alternative

Average performance of one alternative


n
y. j
y
i 1 ij
n
Copyright 2004 David J. Lilja
32
Column Means
Alternatives
Measure
ments
1
2
…
j
…
k
1
y11
y12
…
y1j
…
yk1
2
y21
y22
…
y2j
…
y2k
…
…
…
…
…
…
…
i
yi1
yi2
…
yij
…
yik
…
…
…
…
…
…
…
n
yn1
yn2
…
ynj
…
ynk
Col mean
y.1
y.2
…
y.j
…
y.k
Effect
α1
α2
…
αj
…
αk
Copyright 2004 David J. Lilja
33
Deviation From Column Mean
yij  y. j  eij
eij  deviationof yij fromcolumn mean
 errorin measurements
Copyright 2004 David J. Lilja
34
Error = Deviation From Column
Mean
Alternatives
Measure
ments
1
2
…
j
…
k
1
y11
y12
…
y1j
…
yk1
2
y21
y22
…
y2j
…
y2k
…
…
…
…
…
…
…
i
yi1
yi2
…
yij
…
yik
…
…
…
…
…
…
…
n
yn1
yn2
…
ynj
…
ynk
Col mean
y.1
y.2
…
y.j
…
y.k
Effect
α1
α2
…
αj
…
αk
Copyright 2004 David J. Lilja
35
Overall Mean

Average of all measurements made of all
alternatives
y..



k
n
j 1
i 1
yij
kn
Copyright 2004 David J. Lilja
36
Overall Mean
Alternatives
Measure
ments
1
2
…
j
…
k
1
y11
y12
…
y1j
…
yk1
2
y21
y22
…
y2j
…
y2k
…
…
…
…
…
…
…
i
yi1
yi2
…
yij
…
yik
…
…
…
…
…
…
…
n
yn1
yn2
…
ynj
…
ynk
Col mean
y.1
y.2
…
y.j
…
y.k
Effect
α1
α2
…
αj
…
αk
Copyright 2004 David J. Lilja
37
Deviation From Overall Mean
y. j  y..   j
 j  deviat ionof column mean fromoverallmean
 effectof alt ernat ive j
Copyright 2004 David J. Lilja
38
Effect = Deviation From Overall
Mean
Alternatives
Measure
ments
1
2
…
j
…
k
1
y11
y12
…
y1j
…
yk1
2
y21
y22
…
y2j
…
y2k
…
…
…
…
…
…
…
i
yi1
yi2
…
yij
…
yik
…
…
…
…
…
…
…
n
yn1
yn2
…
ynj
…
ynk
Col mean
y.1
y.2
…
y.j
…
y.k
Effect
α1
α2
…
αj
…
αk
Copyright 2004 David J. Lilja
39
Effects and Errors

Effect is distance from overall mean


Error is distance from column mean



Horizontally across alternatives
Vertically within one alternative
Error across alternatives, too
Individual measurements are then:
yij  y..   j  eij
Copyright 2004 David J. Lilja
40
Sum of Squares of
Differences: SSE
yij  y. j  eij
eij  yij  y. j
2
2
SSE   eij     yij  y. j 
k
n
j 1 i 1
k
n
j 1 i 1
Copyright 2004 David J. Lilja
41
Sum of Squares of
Differences: SSA
y. j  y..   j
 j  y. j  y..
2
2
SSA  n  j   n y. j  y.. 
k
j 1
k
j 1
Copyright 2004 David J. Lilja
42
Sum of Squares of
Differences: SST
yij  y..   j  eij
tij   j  eij  yij  y..
2
2
SST   tij     yij  y.. 
k
n
j 1 i 1
k
n
j 1 i 1
Copyright 2004 David J. Lilja
43
Sum of Squares of Differences
2
SSA  n  y. j  y.. 
k
j 1
2
SSE    yij  y. j 
k
n
j 1 i 1
2
SST    yij  y.. 
k
n
j 1 i 1
Copyright 2004 David J. Lilja
44
Sum of Squares of Differences



SST = differences between each measurement
and overall mean
SSA = variation due to effects of alternatives
SSE = variation due to errors in measurments
SST  SSA  SSE
Copyright 2004 David J. Lilja
45
ANOVA – Fundamental Idea
Separates variation in measured values
into:
Variation due to effects of alternatives

1.
•
SSA – variation across columns
Variation due to errors
2.
•
SSE – variation within a single column
If differences among alternatives are due
to real differences,

•
SSA should be statistically > SSE
Copyright 2004 David J. Lilja
46
Comparing SSE and SSA

Simple approach



SSA / SST = fraction of total variation explained
by differences among alternatives
SSE / SST = fraction of total variation due to
experimental error
But is it statistically significant?
Copyright 2004 David J. Lilja
47
Statistically Comparing SSE
and SSA
Variance  mean square value
totalvariation

degrees of freedom
SSx
2
sx 
df
Copyright 2004 David J. Lilja
48
Degrees of Freedom



df(SSA) = k – 1, since k alternatives
df(SSE) = k(n – 1), since k alternatives, each
with (n – 1) df
df(SST) = df(SSA) + df(SSE) = kn - 1
Copyright 2004 David J. Lilja
49
Degrees of Freedom for Effects
Alternatives
Measure
ments
1
2
…
j
…
k
1
y11
y12
…
y1j
…
yk1
2
y21
y22
…
y2j
…
y2k
…
…
…
…
…
…
…
i
yi1
yi2
…
yij
…
yik
…
…
…
…
…
…
…
n
yn1
yn2
…
ynj
…
ynk
Col mean
y.1
y.2
…
y.j
…
y.k
Effect
α1
α2
…
αj
…
αk
Copyright 2004 David J. Lilja
50
Degrees of Freedom for Errors
Alternatives
Measure
ments
1
2
…
j
…
k
1
y11
y12
…
y1j
…
yk1
2
y21
y22
…
y2j
…
y2k
…
…
…
…
…
…
…
i
yi1
yi2
…
yij
…
yik
…
…
…
…
…
…
…
n
yn1
yn2
…
ynj
…
ynk
Col mean
y.1
y.2
…
y.j
…
y.k
Effect
α1
α2
…
αj
…
αk
Copyright 2004 David J. Lilja
51
Degrees of Freedom for Errors
Alternatives
Measure
ments
1
2
…
j
…
k
1
y11
y12
…
y1j
…
yk1
2
y21
y22
…
y2j
…
y2k
…
…
…
…
…
…
…
i
yi1
yi2
…
yij
…
yik
…
…
…
…
…
…
…
n
yn1
yn2
…
ynj
…
ynk
Col mean
y.1
y.2
…
y.j
…
y.k
Effect
α1
α2
…
αj
…
αk
Copyright 2004 David J. Lilja
52
Variances from Sum of
Squares (Mean Square Value)
SSA
s 
k 1
SSE
2
se 
k (n  1)
2
a
Copyright 2004 David J. Lilja
53
Comparing Variances

Use F-test to compare ratio of variances
s
F
s
2
a
2
e
F[1 ;df ( num ),df ( denom )]  tabulatedcriticalvalues
Copyright 2004 David J. Lilja
54
F-test

If Fcomputed > Ftable
→ We have (1 – α) * 100% confidence that
variation due to actual differences in
alternatives, SSA, is statistically greater than
variation due to errors, SSE.
Copyright 2004 David J. Lilja
55
ANOVA Summary
Variation
Alternatives
Error
T otal
Sum of squares
SSA
SSE
SST
Deg freedom
k 1
k (n  1)
kn  1
Mean square
sa2  SSA (k  1) se2  SSE [k (n  1)]
ComputedF
sa2 se2
T abulatedF
F[1 ;( k 1),k ( n 1)]
Copyright 2004 David J. Lilja
56
ANOVA Example
Alternatives
Measurements
1
2
3
1
0.0972
0.1382
0.7966
2
0.0971
0.1432
0.5300
3
0.0969
0.1382
0.5152
4
0.1954
0.1730
0.6675
5
0.0974
0.1383
0.5298
Column mean
0.1168
0.1462
0.6078
Effects
-0.1735
-0.1441
0.3175
Copyright 2004 David J. Lilja
Overall
mean
0.2903
57
ANOVA Example
Variation
Alternatives
Error
Sum of squares
SSA  0.7585
Deg freedom
k 1  2
k (n  1)  12
Mean square
sa2  0.3793
se2  0.0057
ComputedF
0.3793 0.0057 66.4
T abulatedF
F[ 0.95; 2,12 ]  3.89
T otal
SSE  0.0685 SST  0.8270
Copyright 2004 David J. Lilja
kn  1  14
58
Conclusions from example

SSA/SST = 0.7585/0.8270 = 0.917
→ 91.7% of total variation in measurements is due to
differences among alternatives

SSE/SST = 0.0685/0.8270 = 0.083
→ 8.3% of total variation in measurements is due to noise in
measurements

Computed F statistic > tabulated F statistic
→ 95% confidence that differences among alternatives are
statistically significant.
Copyright 2004 David J. Lilja
59
Contrasts



ANOVA tells us that there is a statistically
significant difference among alternatives
But it does not tell us where difference is
Use method of contrasts to compare subsets
of alternatives



A vs B
{A, B} vs {C}
Etc.
Copyright 2004 David J. Lilja
60
Contrasts

Contrast = linear combination of effects of
alternatives
k
c   w j j
j 1
k
w
j 1
j
0
Copyright 2004 David J. Lilja
61
Contrasts

E.g. Compare effect of system 1 to effect of
system 2
w1  1
w2  1
w3  0
c  (1)1  ( 1) 2  (0) 3
 1   2
Copyright 2004 David J. Lilja
62
Construct confidence interval
for contrasts

Need




Estimate of variance
Appropriate value from t table
Compute confidence interval as before
If interval includes 0

Then no statistically significant difference exists
between the alternatives included in the contrast
Copyright 2004 David J. Lilja
63
Variance of random variables

Recall that, for independent random variables X1
and X2
Var[ X 1  X 2 ]  Var[ X 1 ]  Var[ X 2 ]
Var[aX1 ]  a Var[ X 1 ]
2
Copyright 2004 David J. Lilja
64
Variance of a contrast c
Var[c]  Var[ j 1 ( w j j )]
k
  j 1 Var[w j j ]
k
  j 1 w 2j Var[ j ]
k
s
2
c


k
j 1
2 2
j e
(w s )
kn
SSE
2
se 
k (n  1)
df ( s )  k (n  1)
2
c

Assumes variation due to errors is equally
distributed among kn total measurements
Copyright 2004 David J. Lilja
65
Confidence interval for
contrasts
(c1 , c2 )  c  t1 / 2;k ( n 1) sc

k
sc 
2 2
j e
(
w
s
)
j 1
kn
SSE
s 
k (n  1)
2
e
Copyright 2004 David J. Lilja
66
Example

90% confidence interval for contrast of [Sys1- Sys2]
1  0.1735
 2  0.1441
 3  0.3175
c[1 2 ]  0.1735 (0.1441)  0.0294
s c  se
12  (1) 2  0 2
 0.0275
3(5)
90% : (c1 , c2 )  (0.0784,0.0196)
Copyright 2004 David J. Lilja
67
Important Points

Use one-factor ANOVA to separate total
variation into:
–
Variation within one system

–
Variation between systems


Due to random errors
Due to real differences (+ random error)
Is the variation due to real differences
statistically greater than the variation due to
errors?
Copyright 2004 David J. Lilja
68
Important Points

Use contrasts to compare effects of subsets
of alternatives
Copyright 2004 David J. Lilja
69