Transcript Document
Comparing Two Alternatives
Use confidence intervals for
Before-and-after comparisons
Noncorresponding measurements
Copyright 2004 David J. Lilja
1
Comparing more than two
alternatives
ANOVA
Analysis of Variance
Partitions total variation in a set of
measurements into
Variation due to real differences in alternatives
Variation due to errors
Copyright 2004 David J. Lilja
2
Comparing Two Alternatives
1. Before-and-after
Did a change to the
system have a
statistically significant
impact on performance?
2. Non-corresponding
measurements
Is there a statistically
significant difference
between two different
systems?
Copyright 2004 David J. Lilja
3
Before-and-After Comparison
Assumptions
Before-and-after measurements are not
independent
Variances in two sets of measurements may not
be equal
→ Measurements are related
Form obvious corresponding pairs
Find mean of differences
Copyright 2004 David J. Lilja
4
Before-and-After Comparison
bi beforemeasurement
ai aftermeasurement
d i ai bi
d mean valueof d i
sd standarddeviationof d i
(c1 , c2 ) d t1 / 2;n 1
Copyright 2004 David J. Lilja
sd
n
5
Before-and-After Comparison
Measurement
(i)
Before
(bi)
After
(ai)
Difference
(di = bi – ai)
1
85
86
-1
2
83
88
-5
3
94
90
4
4
90
95
-5
5
88
91
-3
6
87
83
4
Copyright 2004 David J. Lilja
6
Before-and-After Comparison
Mean of differences d 1
Standard deviation sd 4.15
From mean of differences, appears that change
reduced performance.
However, standard deviation is large.
Copyright 2004 David J. Lilja
7
95% Confidence Interval for Mean
of Differences
t1 / 2;n1 t0.975;5 2.571
n
…
5
6
…
∞
0.90
…
1.476
1.440
…
1.282
a
0.95
…
2.015
1.943
…
1.645
Copyright 2004 David J. Lilja
0.975
…
2.571
2.447
…
1.960
8
95% Confidence Interval for Mean
of Differences
sd
c1, 2 d t1 / 2;n 1
n
t1 / 2;n 1 t0.975;5 2.571
c1, 2
c1, 2
4.15
1 2.571
6
[5.36,3.36]
Copyright 2004 David J. Lilja
9
95% Confidence Interval for Mean
of Differences
c1,2 = [-5.36, 3.36]
Interval includes 0
→ With 95% confidence, there is no statistically
significant difference between the two
systems.
Copyright 2004 David J. Lilja
10
Noncorresponding
Measurements
No direct correspondence
between pairs of
measurements
Unpaired observations
n1 measurements of system 1
n2 measurements of system 2
Copyright 2004 David J. Lilja
11
Confidence Interval for Difference
of Means
1.
2.
3.
4.
5.
Compute means
Compute difference of means
Compute standard deviation of difference of
means
Find confidence interval for this difference
No statistically significant difference
between systems if interval includes 0
Copyright 2004 David J. Lilja
12
Confidence Interval for Difference
of Means
Differenceof means:
x x1 x2
Combinedstandarddeviation:
sx
2
1
2
2
s
s
n1 n2
Copyright 2004 David J. Lilja
13
Why Add Standard
Deviations?
x1 – x2
s1
s2
x1
x2
Copyright 2004 David J. Lilja
14
Number of Degrees of
Freedom
Not simply ndf n1 n2 2
2
1
1
2
2
2
s
s
n n
ndf
s
2
1
2
2
2
2
n1
s n2
n1 1
n2 1
Copyright 2004 David J. Lilja
2
15
Example – Noncorresponding
Measurements
n1 12 measurements
n2 7 measurements
x1 1243s
x2 1085s
s1 38.5
s2 54.0
Copyright 2004 David J. Lilja
16
Example (cont.)
x x1 x2 1243 1085 158
38.52 542
23.24
sx
7
12
ndf
2 2
2
38.5 54
7
12
38.5
2
542 7
12
7 1
12 1
2
Copyright 2004 David J. Lilja
2
9.62 10
17
Example (cont.): 90% CI
c1, 2 x t1 / 2;ndf s x
t1 / 2;ndf t0.95;10 1.813
c1, 2 158 1.81323.24
c1, 2 [116,200]
Copyright 2004 David J. Lilja
18
A Special Case
If n1 <≈ 30 or n2 <≈ 30
And errors are normally distributed
And s1 = s2 (standard deviations are equal)
OR
If n1 = n2
And errors are normally distributed
Even if s1 is not equal s2
Then special case applies…
Copyright 2004 David J. Lilja
19
A Special Case
1 1
n1 n2
(c1 , c2 ) x t1 / 2;ndf s p
ndf n1 n2 2
sp
s (n1 1) s (n2 1)
n1 n2 2
2
1
2
2
Typically produces tighter confidence interval
Sometimes useful after obtaining additional
measurements to tease out small differences
Copyright 2004 David J. Lilja
20
Comparing Proportions
m1 # eventsof interestin system1
n1 total# eventsin system1
m2 # eventsof interestin system2
n2 total# eventsin system2
Copyright 2004 David J. Lilja
21
Comparing Proportions
Since it is a binomial distribution
mi
Mean pi
ni
pi (1 pi )
Variance
ni
Copyright 2004 David J. Lilja
22
Comparing Proportions
p ( p1 p2 )
sp
p1 (1 p1 ) p2 (1 p2 )
n1
n2
(c1 , c2 ) p z1 / 2 s p
Copyright 2004 David J. Lilja
23
OS Example (cont)
Initial operating system
Upgrade OS
n1 = 1,300,203 interrupts (3.5 hours)
m1 = 142,892 interrupts occurred in OS code
p1 = 0.1099, or 11% of time executing in OS
n2 = 999,382
m2 = 84,876
p2 = 0.0849, or 8.5% of time executing in OS
Statistically significant improvement?
Copyright 2004 David J. Lilja
24
OS Example (cont)
p = p1 – p2 = 0.0250
sp = 0.0003911
90% confidence interval
(0.0242, 0.0257)
Statistically significant difference?
Copyright 2004 David J. Lilja
25
Important Points
Use confidence intervals to determine if there
are statistically significant differences
Before-and-after comparisons
Noncorresponding measurements
Find interval for mean of differences
Find interval for difference of means
Proportions
If interval includes zero
→ No statistically significant difference
Copyright 2004 David J. Lilja
26
Comparing More Than Two
Alternatives
Naïve approach
Compare confidence intervals
Copyright 2004 David J. Lilja
27
One-Factor Analysis of
Variance (ANOVA)
Very general technique
Also called
Look at total variation in a set of measurements
Divide into meaningful components
One-way classification
One-factor experimental design
Introduce basic concept with one-factor
ANOVA
Generalize later with design of experiments
Copyright 2004 David J. Lilja
28
One-Factor Analysis of
Variance (ANOVA)
Separates total variation observed in a set
of measurements into:
Variation within one system
1.
Variation between systems
2.
Due to random measurement errors
Due to real differences + random error
Is variation(2) statistically > variation(1)?
Copyright 2004 David J. Lilja
29
ANOVA
Make n measurements of k alternatives
yij = ith measurment on jth alternative
Assumes errors are:
Independent
Gaussian (normal)
Copyright 2004 David J. Lilja
30
Measurements for All Alternatives
Alternatives
Measure
ments
1
2
…
j
…
k
1
y11
y12
…
y1j
…
yk1
2
y21
y22
…
y2j
…
y2k
…
…
…
…
…
…
…
i
yi1
yi2
…
yij
…
yik
…
…
…
…
…
…
…
n
yn1
yn2
…
ynj
…
ynk
Col mean
y.1
y.2
…
y.j
…
y.k
Effect
α1
α2
…
αj
…
αk
Copyright 2004 David J. Lilja
31
Column Means
Column means are average values of all
measurements within a single alternative
Average performance of one alternative
n
y. j
y
i 1 ij
n
Copyright 2004 David J. Lilja
32
Column Means
Alternatives
Measure
ments
1
2
…
j
…
k
1
y11
y12
…
y1j
…
yk1
2
y21
y22
…
y2j
…
y2k
…
…
…
…
…
…
…
i
yi1
yi2
…
yij
…
yik
…
…
…
…
…
…
…
n
yn1
yn2
…
ynj
…
ynk
Col mean
y.1
y.2
…
y.j
…
y.k
Effect
α1
α2
…
αj
…
αk
Copyright 2004 David J. Lilja
33
Deviation From Column Mean
yij y. j eij
eij deviationof yij fromcolumn mean
errorin measurements
Copyright 2004 David J. Lilja
34
Error = Deviation From Column
Mean
Alternatives
Measure
ments
1
2
…
j
…
k
1
y11
y12
…
y1j
…
yk1
2
y21
y22
…
y2j
…
y2k
…
…
…
…
…
…
…
i
yi1
yi2
…
yij
…
yik
…
…
…
…
…
…
…
n
yn1
yn2
…
ynj
…
ynk
Col mean
y.1
y.2
…
y.j
…
y.k
Effect
α1
α2
…
αj
…
αk
Copyright 2004 David J. Lilja
35
Overall Mean
Average of all measurements made of all
alternatives
y..
k
n
j 1
i 1
yij
kn
Copyright 2004 David J. Lilja
36
Overall Mean
Alternatives
Measure
ments
1
2
…
j
…
k
1
y11
y12
…
y1j
…
yk1
2
y21
y22
…
y2j
…
y2k
…
…
…
…
…
…
…
i
yi1
yi2
…
yij
…
yik
…
…
…
…
…
…
…
n
yn1
yn2
…
ynj
…
ynk
Col mean
y.1
y.2
…
y.j
…
y.k
Effect
α1
α2
…
αj
…
αk
Copyright 2004 David J. Lilja
37
Deviation From Overall Mean
y. j y.. j
j deviat ionof column mean fromoverallmean
effectof alt ernat ive j
Copyright 2004 David J. Lilja
38
Effect = Deviation From Overall
Mean
Alternatives
Measure
ments
1
2
…
j
…
k
1
y11
y12
…
y1j
…
yk1
2
y21
y22
…
y2j
…
y2k
…
…
…
…
…
…
…
i
yi1
yi2
…
yij
…
yik
…
…
…
…
…
…
…
n
yn1
yn2
…
ynj
…
ynk
Col mean
y.1
y.2
…
y.j
…
y.k
Effect
α1
α2
…
αj
…
αk
Copyright 2004 David J. Lilja
39
Effects and Errors
Effect is distance from overall mean
Error is distance from column mean
Horizontally across alternatives
Vertically within one alternative
Error across alternatives, too
Individual measurements are then:
yij y.. j eij
Copyright 2004 David J. Lilja
40
Sum of Squares of
Differences: SSE
yij y. j eij
eij yij y. j
2
2
SSE eij yij y. j
k
n
j 1 i 1
k
n
j 1 i 1
Copyright 2004 David J. Lilja
41
Sum of Squares of
Differences: SSA
y. j y.. j
j y. j y..
2
2
SSA n j n y. j y..
k
j 1
k
j 1
Copyright 2004 David J. Lilja
42
Sum of Squares of
Differences: SST
yij y.. j eij
tij j eij yij y..
2
2
SST tij yij y..
k
n
j 1 i 1
k
n
j 1 i 1
Copyright 2004 David J. Lilja
43
Sum of Squares of Differences
2
SSA n y. j y..
k
j 1
2
SSE yij y. j
k
n
j 1 i 1
2
SST yij y..
k
n
j 1 i 1
Copyright 2004 David J. Lilja
44
Sum of Squares of Differences
SST = differences between each measurement
and overall mean
SSA = variation due to effects of alternatives
SSE = variation due to errors in measurments
SST SSA SSE
Copyright 2004 David J. Lilja
45
ANOVA – Fundamental Idea
Separates variation in measured values
into:
Variation due to effects of alternatives
1.
•
SSA – variation across columns
Variation due to errors
2.
•
SSE – variation within a single column
If differences among alternatives are due
to real differences,
•
SSA should be statistically > SSE
Copyright 2004 David J. Lilja
46
Comparing SSE and SSA
Simple approach
SSA / SST = fraction of total variation explained
by differences among alternatives
SSE / SST = fraction of total variation due to
experimental error
But is it statistically significant?
Copyright 2004 David J. Lilja
47
Statistically Comparing SSE
and SSA
Variance mean square value
totalvariation
degrees of freedom
SSx
2
sx
df
Copyright 2004 David J. Lilja
48
Degrees of Freedom
df(SSA) = k – 1, since k alternatives
df(SSE) = k(n – 1), since k alternatives, each
with (n – 1) df
df(SST) = df(SSA) + df(SSE) = kn - 1
Copyright 2004 David J. Lilja
49
Degrees of Freedom for Effects
Alternatives
Measure
ments
1
2
…
j
…
k
1
y11
y12
…
y1j
…
yk1
2
y21
y22
…
y2j
…
y2k
…
…
…
…
…
…
…
i
yi1
yi2
…
yij
…
yik
…
…
…
…
…
…
…
n
yn1
yn2
…
ynj
…
ynk
Col mean
y.1
y.2
…
y.j
…
y.k
Effect
α1
α2
…
αj
…
αk
Copyright 2004 David J. Lilja
50
Degrees of Freedom for Errors
Alternatives
Measure
ments
1
2
…
j
…
k
1
y11
y12
…
y1j
…
yk1
2
y21
y22
…
y2j
…
y2k
…
…
…
…
…
…
…
i
yi1
yi2
…
yij
…
yik
…
…
…
…
…
…
…
n
yn1
yn2
…
ynj
…
ynk
Col mean
y.1
y.2
…
y.j
…
y.k
Effect
α1
α2
…
αj
…
αk
Copyright 2004 David J. Lilja
51
Degrees of Freedom for Errors
Alternatives
Measure
ments
1
2
…
j
…
k
1
y11
y12
…
y1j
…
yk1
2
y21
y22
…
y2j
…
y2k
…
…
…
…
…
…
…
i
yi1
yi2
…
yij
…
yik
…
…
…
…
…
…
…
n
yn1
yn2
…
ynj
…
ynk
Col mean
y.1
y.2
…
y.j
…
y.k
Effect
α1
α2
…
αj
…
αk
Copyright 2004 David J. Lilja
52
Variances from Sum of
Squares (Mean Square Value)
SSA
s
k 1
SSE
2
se
k (n 1)
2
a
Copyright 2004 David J. Lilja
53
Comparing Variances
Use F-test to compare ratio of variances
s
F
s
2
a
2
e
F[1 ;df ( num ),df ( denom )] tabulatedcriticalvalues
Copyright 2004 David J. Lilja
54
F-test
If Fcomputed > Ftable
→ We have (1 – α) * 100% confidence that
variation due to actual differences in
alternatives, SSA, is statistically greater than
variation due to errors, SSE.
Copyright 2004 David J. Lilja
55
ANOVA Summary
Variation
Alternatives
Error
T otal
Sum of squares
SSA
SSE
SST
Deg freedom
k 1
k (n 1)
kn 1
Mean square
sa2 SSA (k 1) se2 SSE [k (n 1)]
ComputedF
sa2 se2
T abulatedF
F[1 ;( k 1),k ( n 1)]
Copyright 2004 David J. Lilja
56
ANOVA Example
Alternatives
Measurements
1
2
3
1
0.0972
0.1382
0.7966
2
0.0971
0.1432
0.5300
3
0.0969
0.1382
0.5152
4
0.1954
0.1730
0.6675
5
0.0974
0.1383
0.5298
Column mean
0.1168
0.1462
0.6078
Effects
-0.1735
-0.1441
0.3175
Copyright 2004 David J. Lilja
Overall
mean
0.2903
57
ANOVA Example
Variation
Alternatives
Error
Sum of squares
SSA 0.7585
Deg freedom
k 1 2
k (n 1) 12
Mean square
sa2 0.3793
se2 0.0057
ComputedF
0.3793 0.0057 66.4
T abulatedF
F[ 0.95; 2,12 ] 3.89
T otal
SSE 0.0685 SST 0.8270
Copyright 2004 David J. Lilja
kn 1 14
58
Conclusions from example
SSA/SST = 0.7585/0.8270 = 0.917
→ 91.7% of total variation in measurements is due to
differences among alternatives
SSE/SST = 0.0685/0.8270 = 0.083
→ 8.3% of total variation in measurements is due to noise in
measurements
Computed F statistic > tabulated F statistic
→ 95% confidence that differences among alternatives are
statistically significant.
Copyright 2004 David J. Lilja
59
Contrasts
ANOVA tells us that there is a statistically
significant difference among alternatives
But it does not tell us where difference is
Use method of contrasts to compare subsets
of alternatives
A vs B
{A, B} vs {C}
Etc.
Copyright 2004 David J. Lilja
60
Contrasts
Contrast = linear combination of effects of
alternatives
k
c w j j
j 1
k
w
j 1
j
0
Copyright 2004 David J. Lilja
61
Contrasts
E.g. Compare effect of system 1 to effect of
system 2
w1 1
w2 1
w3 0
c (1)1 ( 1) 2 (0) 3
1 2
Copyright 2004 David J. Lilja
62
Construct confidence interval
for contrasts
Need
Estimate of variance
Appropriate value from t table
Compute confidence interval as before
If interval includes 0
Then no statistically significant difference exists
between the alternatives included in the contrast
Copyright 2004 David J. Lilja
63
Variance of random variables
Recall that, for independent random variables X1
and X2
Var[ X 1 X 2 ] Var[ X 1 ] Var[ X 2 ]
Var[aX1 ] a Var[ X 1 ]
2
Copyright 2004 David J. Lilja
64
Variance of a contrast c
Var[c] Var[ j 1 ( w j j )]
k
j 1 Var[w j j ]
k
j 1 w 2j Var[ j ]
k
s
2
c
k
j 1
2 2
j e
(w s )
kn
SSE
2
se
k (n 1)
df ( s ) k (n 1)
2
c
Assumes variation due to errors is equally
distributed among kn total measurements
Copyright 2004 David J. Lilja
65
Confidence interval for
contrasts
(c1 , c2 ) c t1 / 2;k ( n 1) sc
k
sc
2 2
j e
(
w
s
)
j 1
kn
SSE
s
k (n 1)
2
e
Copyright 2004 David J. Lilja
66
Example
90% confidence interval for contrast of [Sys1- Sys2]
1 0.1735
2 0.1441
3 0.3175
c[1 2 ] 0.1735 (0.1441) 0.0294
s c se
12 (1) 2 0 2
0.0275
3(5)
90% : (c1 , c2 ) (0.0784,0.0196)
Copyright 2004 David J. Lilja
67
Important Points
Use one-factor ANOVA to separate total
variation into:
–
Variation within one system
–
Variation between systems
Due to random errors
Due to real differences (+ random error)
Is the variation due to real differences
statistically greater than the variation due to
errors?
Copyright 2004 David J. Lilja
68
Important Points
Use contrasts to compare effects of subsets
of alternatives
Copyright 2004 David J. Lilja
69