Applied Statistics - Duke Computer Science

Download Report

Transcript Applied Statistics - Duke Computer Science

Comparison Methodology
 Meaning of a sample
 Confidence intervals
• Making decisions and comparing alternatives
• Special considerations in confidence intervals
• Sample sizes
© 1998, Geoff Kuenning
Estimating
Confidence Intervals
• Two formulas for confidence intervals
– Over 30 samples from any distribution:
z-distribution
– Small sample from normally distributed
population: t-distribution
• Common error: using t-distribution for
non-normal population
– Central Limit Theorem often saves us
© 1998, Geoff Kuenning
The z Distribution
• Interval on either side of mean:
s 

x  z1  
2  n
• Significance level  is small for large
confidence levels
• Tables of z are tricky: be careful!
© 1998, Geoff Kuenning
The t Distribution
• Formula is almost the same:
s 

x  t 1   ; n 1  
 2  n
• Usable only for normally distributed
populations!
• But works with small samples
© 1998, Geoff Kuenning
Making Decisions
• Why do we use confidence intervals?
– Summarizes error in sample mean
– Gives way to decide if measurement is
meaningful
– Allows comparisons in face of error
• But remember: at 90% confidence, 10% of
sample means do not include population
mean
© 1998, Geoff Kuenning
Testing for Zero Mean
• Is population mean significantly nonzero?
• If confidence interval includes 0, answer is no
• Can test for any value (mean of sums is sum
of means)
• Example: our height samples are consistent
with average height of 170 cm
– Also consistent with 160 and 180!
© 1998, Geoff Kuenning
Comparing Alternatives
• Often need to find better system
– Choose fastest computer to buy
– Prove our algorithm runs faster
• Different methods for paired/unpaired
observations
– Paired if ith test on each system was same
– Unpaired otherwise
© 1998, Geoff Kuenning
Comparing Paired
Observations
• Treat problem as 1 sample of n pairs
• For each test calculate performance
difference
• Calculate confidence interval for differences
• If interval includes zero, systems aren’t
different
– If not, sign indicates which is better
© 1998, Geoff Kuenning
Example: Comparing Paired Observations
•Do home baseball teams outscore visitors?
•Sample from 9-4-96:
12
10
8
6
4
home
2
visitors
0
home
4 5 0 11 6 6 3 12 9 5 6 3 1 6
visitors 2 7 7 6 0 7 10 6 2 2 4 2 2 0
© 1998, Geoff Kuenning
Example: Comparing Paired Observations
8
6
4
2
0
-2
-4
-6
-8
• H-V 2 -2 -7 5 6 -1 -7 6 7 3 2 1 -1 6
• Mean 1.4, 90% interval (-0.75, 3.6)
– Can’t reject the hypothesis that difference is 0.
– 70% interval is (0.10, 2.76)
© 1998, Geoff Kuenning
Comparing Unpaired
Observations
• A sample of size na and nb for each
alternative A and B
• Start with confidence intervals
– If no overlap:
mean
A
B
• Systems are different and higher
mean is better (for HB metrics)
– If overlap and each CI contains
other mean:
B
mean
• Systems are not different at this
level
• If close call, could lower confidence
level
A
– If overlap and one mean isn’t in
other CI
• Must do t-test
© 1998, Geoff Kuenning
B
mean
A
The t-test (1)
1. Compute sample means x a and x b
2. Compute sample standard deviations sa and
sb
3. Compute mean difference = x a  x b
4. Compute standard deviation of difference:
2
a
2
b
s
s
s

na nb
© 1998, Geoff Kuenning
The t-test (2)
5. Compute effective degrees of freedom:

s
2
a
/ na  s / nb 
2
b
2
2
1  sa2 
1 s 
 
  
na  1  na 
nb  1  nb 
2
b
2
2
6. Compute the confidence interval:
 x a  x b   t1 / 2; s
!
7. If interval includes zero, no difference
© 1998, Geoff Kuenning
Comparing Proportions
• If k of n trials give a certain result, then
confidence interval is
k
k  k2 / n
 z1 / 2
n
n
!
• If interval includes 0.5, can’t say which
outcome is statistically meaningful
• Must have k>10 to get valid results
© 1998, Geoff Kuenning
Special Considerations
• Selecting a confidence level
• Hypothesis testing
• One-sided confidence intervals
© 1998, Geoff Kuenning
Selecting a
Confidence Level
• Depends on cost of being wrong
• 90%, 95% are common values for scientific
papers
• Generally, use highest value that lets you
make a firm statement
– But it’s better to be consistent throughout a
given paper
© 1998, Geoff Kuenning
Hypothesis Testing
• The null hypothesis (H0) is common in
statistics
– Confusing due to double negative
– Gives less information than confidence
interval
– Often harder to compute
• Should understand that rejecting null
hypothesis implies result is meaningful
© 1998, Geoff Kuenning
One-Sided
Confidence Intervals
• Two-sided intervals test for mean being
outside a certain range (see “error bands” in
previous graphs)
• One-sided tests useful if only interested in
one limit
• Use z1- or t1-;n instead of z1-/2 or t1-/2;n in
formulas
© 1998, Geoff Kuenning
Sample Sizes
• Bigger sample sizes give narrower intervals
– Smaller values of t, v as n increases
– n in formulas
• But sample collection is often expensive
– What is the minimum we can get away
with?
• Start with a small number of preliminary
measurements to estimate variance.
© 1998, Geoff Kuenning
Choosing a Sample Size
• To get a given percentage error ±r%:
2
100
zs

n  
 rx 
• Here, z represents either z or t as appropriate
• For a proportion p = k/n:
p1  p
nz
2
r
2
© 1998, Geoff Kuenning
Example of
Choosing Sample Size
• Five runs of a compilation took 22.5, 19.8,
21.1, 26.7, 20.2 seconds
• How many runs to get ±5% confidence
interval at 90% confidence level?
• x = 22.1, s = 2.8, t0.95;4 = 2.132
2
 100 2.132 2.8
2
n
  5.4  29.2


522.1
© 1998, Geoff Kuenning
Linear Regression Models
 What is a (good) model?
 Estimating model parameters
• Allocating variation
• Confidence intervals for regressions
• Verifying assumptions visually
© 1998, Geoff Kuenning
What Is a (Good) Model?
• For correlated data, model predicts response
given an input
• Model should be equation that fits data
• Standard definition of “fits” is least-squares
– Minimize squared error
– While keeping mean error zero
– Minimizes variance of errors
© 1998, Geoff Kuenning
Least-Squared Error
N
• If y  b0  b1 x then error in estimate for xi is
N
ei  yi  yii
• Minimize Sum of Squared Errors (SSE)
n
n
 e    yi  b0  b1 xi 
2
i
i 1
2
i 1
• Subject to the constraint
n
n
 e  y
i
i 1
© 1998, Geoff Kuenning
i
i 1
 b0  b1 xi   0
Estimating Model Parameters
• Best regression parameters are
 xy  nx y
b1 
 x 2  nx 2
where
1
x   xi
n
 xy   xi yi
• Note error in book!
© 1998, Geoff Kuenning
b0  y  b1 x
1
y   yi
n
2
2
x

x

 i
Parameter Estimation
Example
• Execution time of a script for various loop
counts:
Loops
3
5
7
9 10
Time 1.19 1.73 2.53 2.89 3.26
• x = 6.8, y = 2.32, xy = 88.54, x2 = 264
88.54  56.82.32
b1 
 0.29
2
264  56.8
• b0 = 2.32  (0.29)(6.8) = 0.35
© 1998, Geoff Kuenning
Graph of Parameter
Estimation Example
3
2
1
0
3
© 1998, Geoff Kuenning
4
5
6
7
8
9
10
Variants of Linear Regression
• Some non-linear relationships can be
handled by transformations
– For y = aebx take logarithm of y, do
regression on log(y) = b0+b1x, let b = b1,
a  eb
0
– For y = a+b log(x), take log of x before
fitting parameters, let b = b1, a = b0
– For y = axb, take log of both x and y, let
b0
b = b 1, a  e
© 1998, Geoff Kuenning
Allocating Variation
• If no regression, best guess of y is y
• Observed values of y differ from y , giving rise
to errors (variance)
• Regression gives better guess, but there are
still errors
• We can evaluate quality of regression by
allocating sources of errors
© 1998, Geoff Kuenning
The Total Sum of Squares
• Without regression, squared error is
n
n

SST    yi  y   yi2  2 yi y  y 2
i 1
2
i 1
 n 2
 n 
   yi   2 y  yi   ny 2
 i 1 
 i 1 
 n 2
   yi   2 y ny  ny 2
 i 1 
 n 2
   yi   ny 2  SSY  SS 0
 i 1 
© 1998, Geoff Kuenning

The Sum of Squares
from Regression
• Recall that regression error is
SSE   e    yi  y i 
2
i
• Error without regression is SST
• So regression explains SSR = SST - SSE
• Regression quality measured by coefficient of
determination
SSR SST  SSE
R 

SST
SST
2
© 1998, Geoff Kuenning
Evaluating Coefficient
of Determination
• Compute SST  (  y 2 )  ny 2
• Compute SSE   y 2  b0  y  b1  xy
SST  SSE
• Compute R 
SST
2
© 1998, Geoff Kuenning
Example of Coefficient
of Determination
• For previous regression example
3
5
7
9 10
1.19 1.73 2.53 2.89 3.26
– y = 11.60, y2 = 29.79, xy = 88.54,
ny  52.32  26.9
2
–
–
–
–
2
SSE = 29.79-(0.35)(11.60)-(0.29)(88.54) = 0.05
SST = 29.79-26.9 = 2.89
SSR = 2.89-.05 = 2.84
R2 = (2.89-0.05)/2.89 = 0.98
© 1998, Geoff Kuenning
Standard Deviation
of Errors
• Variance of errors is SSE divided by degrees
of freedom
– DOF is n2 because we’ve calculated 2
regression parameters from the data
– So variance (mean squared error, MSE) is
SSE/(n2)
• Standard deviation of errors is square root:
SSE
se 
n2
© 1998, Geoff Kuenning
Checking Degrees
of Freedom
• Degrees of freedom always equate:
– SS0 has 1 (computed from y )
– SST has n1 (computed from data and y,
which uses up 1)
– SSE has n2 (needs 2 regression
parameters)
– So
SST  SSY  SS 0  SSR  SSE
n  1  n  1  1  (n  2)
© 1998, Geoff Kuenning
Example of
Standard Deviation of Errors
• For our regression example, SSE was 0.05,
so MSE is 0.05/3 = 0.017 and se = 0.13
• Note high quality of our regression:
– R2 = 0.98
– se = 0.13
– Why such a nice straight-line fit?
© 1998, Geoff Kuenning
Confidence Intervals
for Regressions
• Regression is done from a single population
sample (size n)
– Different sample might give different results
– True model is y = 0 + 1x
– Parameters b0 and b1 are really means
taken from a population sample
© 1998, Geoff Kuenning
Calculating Intervals
for Regression Parameters
• Standard deviations of parameters:
sb  se
0
1
x2

n  x 2  nx 2
se
sb 
2
x
  nx 2
1
• Confidence intervals are bi t sbi
• where t has n - 2 degrees of freedom
!
© 1998, Geoff Kuenning
Example of Regression
Confidence Intervals
• Recall se = 0.13, n = 5, x2 = 264,x = 6.8
• So
2
1
(6.8)
sb  0.13 
 0.16
2
5 264  5(6.8)
0.13
sb 
 0.004
2
264  5(6.8)
0
1
• Using a 90% confidence level, t0.95;3 = 2.353
© 1998, Geoff Kuenning
Regression Confidence
Example, cont’d
• Thus, b0 interval is
!
0.35
2.353(0.16) = (-0.03,0.73)
– Not significant at 90%
• And b1 is
!
0.29
2.353(0.004) = (0.28,0.30)
– Significant at 90% (and would survive even 99.9%
test)
© 1998, Geoff Kuenning
Confidence Intervals
for Nonlinear Regressions
• For nonlinear fits using exponential
transformations:
– Confidence intervals apply to transformed
parameters
– Not valid to perform inverse transformation
on intervals
© 1998, Geoff Kuenning
Confidence Intervals
for Predictions
• Previous confidence intervals are for
parameters
– How certain can we be that the parameters
are correct?
• Purpose of regression is prediction
– How accurate are the predictions?
– Regression gives mean of predicted
response, based on sample we took
© 1998, Geoff Kuenning
Predicting m Samples
• Standard deviation for mean of future sample of m
observations at xp is
1 1  x p  x
 
m n  x 2  nx 2
2
SsNy
ymp
mp
 se
• Note deviation drops as m 
• Variance minimal at x = x
• Use t-quantiles with n–2 DOF for interval
© 1998, Geoff Kuenning
Example of Confidence
of Predictions
• Using previous equation, what is predicted
time for a single run of 8 loops?
• Time = 0.35 + 0.29(8) = 2.67
• Standard deviation of errors se = 0.13
sSyy
N
1 pp
8  6.82
1
 0.13 1  
 0.14
5 264  5(6.8)
• 90% interval is then
2.67  2.353(0.14)  (2.34,3.00)
!
© 1998, Geoff Kuenning
Verifying Assumptions
Visually
• Regressions are based on assumptions:
– Linear relationship between response y
and predictor x
• Or nonlinear relationship used in fitting
– Predictor x nonstochastic and error-free
– Model errors statistically independent
• With distribution N(0,c) for constant c
• If assumptions violated, model misleading or
invalid
© 1998, Geoff Kuenning
Testing Linearity
• Scatter plot x vs. y to see basic curve type
© 1998, Geoff Kuenning
Linear
Piecewise Linear
Outlier
Nonlinear (Power)
Testing Independence
of Errors
N
y
• Scatter-plot i versus yi i
• Should be no visible trend
• Example from our curve fit:
0.15
0.1
0.05
0
0
-0.05
-0.1
© 1998, Geoff Kuenning
0.5
1
1.5
2
2.5
3
3.5
4
More on Testing Independence
• May be useful to plot error residuals versus
experiment number
– In previous example, this gives same plot
except for x scaling
• No foolproof tests
© 1998, Geoff Kuenning
Testing for Normal Errors
• Prepare quantile-quantile plot
• Example for our regression:
0.15
0.1
0.05
0
-0.05
-0.1
-0.15
-1.3
© 1998, Geoff Kuenning
0
1.3
Testing for Constant
Standard Deviation
•
•
•
•
Tongue-twister: homoscedasticity
Return to independence plot
Look for trend in spread
Example:
0.15
0.1
0.05
0
0
-0.05
-0.1
© 1998, Geoff Kuenning
0.5
1
1.5
2
2.5
3
3.5
4
Linear Regression
Can Be Misleading
• Regression throws away some information
about the data
– To allow more compact summarization
• Sometimes vital characteristics are thrown
away
– Often, looking at data plots can tell you
whether you will have a problem
© 1998, Geoff Kuenning
Example of Misleading
Regression
x
10
8
13
9
11
14
6
4
12
7
5
I
y
8.04
6.95
7.58
8.81
8.33
9.96
7.24
4.26
10.84
4.82
5.68
© 1998, Geoff Kuenning
x
10
8
13
9
11
14
6
4
12
7
5
II
y
9.14
8.14
8.74
8.77
9.26
8.10
6.13
3.10
9.13
7.26
4.74
x
10
8
13
9
11
14
6
4
12
7
5
III
y
7.46
6.77
12.74
7.11
7.81
8.84
6.08
5.39
8.15
6.42
5.73
x
8
8
8
8
8
8
8
19
8
8
8
IV
y
6.58
5.76
7.71
8.84
8.47
7.04
5.25
12.50
5.56
7.91
6.89
What Does Regression Tell Us
About These Data Sets?
•
•
•
•
•
•
•
•
Exactly the same thing for each!
N = 11
Mean of y = 7.5
Y = 3 + .5 X
Standard error of regression is 0.118
All the sums of squares are the same
Correlation coefficient = .82
R2 = .67
© 1998, Geoff Kuenning
Now Look at the Data Plots
12
12
10
I
10
II
8
8
6
6
4
4
2
2
0
0
0
5
10
15
20
14 0
5
10
15
20
14
12
12
III
10
IV
10
8
8
6
6
4
4
2
2
0
0
0
2
4
© 1998, Geoff Kuenning
6
8
10
12
14
0
5
10
15
20
For Discussion Today
Project Proposal
1. Statement of hypothesis
2. Workload decisions
3. Metrics to be used
4. Method