Confidence interval
Download
Report
Transcript Confidence interval
CPSC 531: Output Data Analysis
Instructor: Anirban Mahanti
Office: ICT 745
Email: [email protected]
Class Location: TRB 101
Lectures: TR 15:30 – 16:45 hours
Slides primarily adapted from:
“The Art of Computer Systems Performance
Analysis” by Raj Jain, Wiley 1991.
[Chapters 12, 13, and 25]
CPSC 531: Data Analysis
1
Outline
Measures of Central Tendency
Mean, Median, Mode
How to Summarize Variability?
Comparing Systems Using Sample Data
Comparing Two Alternatives
Transient Removal
CPSC 531: Data Analysis
2
Measures of Central Tendency (1)
Sample mean – sum of all observations divided
by the total number of observations
Always exists and is unique
Mean gives equal weight to all observations
Mean is strongly affected by outliers
Sample median – list observations in an
increasing order; the observation in the middle
of the list is the median;
Even # of observations – mean of middle two values
Always exists and is unique
Resistant to outliers (compared to mean)
CPSC 531: Data Analysis
3
Measures of Central Tendency (2)
mode
0.4
Mode may not exists (e.g.,
all sample have equal
weight)
More than one mode may
exist (i.e. bimodal)
If only one mode then
distribution is unimodal
0.2
0.1
0
0
4
8
12
x
16
20
mode
mode
0.2
PDF f(x)
0.15
0.1
0.05
0
0
4
8
12
16
20
x
mode
0.6
0.5
PDF f(x)
histogram from the
observations; find
bucket with peak
frequency; the middle
point of this bucket is
the mode;
PDF f(x)
Sample mode – plot
0.3
0.4
0.3
0.2
0.1
0
0
4
8
12
x
CPSC 531: Data Analysis
4
Measure of Central Tendency (3)
Is data categorical?
Yes:
use mode
e.g. most used resource in a system
Is total of interest?
Yes: use mean
e.g. total response time for Web requests
Is distribution skewed?
Yes: use median
• Median less influenced by outlier than mean.
No: use mean. Why?
CPSC 531: Data Analysis
5
Common Misuses of Means (1)
Usefulness of mean depends on the number of
observations and the variance
E.g. two response time samples: 10 ms and 1000 ms.
Mean is 505 ms! Correct index but useless.
Using mean without regard to skewness
System A
10
9
11
10
10
Mean: 10
Mode: 10
Min,Max: [9,11]
System B
5
5
5
4
31
10
5
[4,31]
CPSC 531: Data Analysis
6
Common Misuses of Means (2)
Mean of a Product by Multiplying means
Mean of product equals product of means if the
two random variables are independent.
If x and y are correlated E(xy) != E(x)E(y)
Avg. users in system 23; avg. processes/user 2.
Avg. # of processes in system? Is it 46?
No! Number of processes spawned by users
depends on the load.
CPSC 531: Data Analysis
7
Outline
Measures of Central Tendency
How to Summarize Variability?
Comparing Systems Using Sample Data
Comparing Two Alternatives
Transient Removal
CPSC 531: Data Analysis
8
Summarizing Variability
Summarizing by a single number rarely enough.
Given two systems with same mean, we generally
prefer one with less variability
20%
4s
Mean=2s
Response Time
Frequency
Frequency
80% 1.5 s
60% ~ 0.001 s
~5 s
40%
Mean=2s
Response Time
Indices of dispersion
• Range, Variance, 10- and 90-percentiles, Semi-interquantile
range, and mean absolute deviation
CPSC 531: Data Analysis
9
Range
Easy to calculate; range = max – min
In many scenarios, not very useful:
Min may be zero
Max may be an “outlier”
With more samples, max may keep increasing and
min may keep decreasing → no “stable” point
Range is useful if systems performance is
bounded
CPSC 531: Data Analysis
10
Variance and Standard Deviation
Given sample of n observations {x1, x2, …, xn} the
sample variance is calculated as:
2
1 n
s
xi x
n 1 i 1
2
1 n
where x xi
n i 1
Sample variance: s2 (square of the unit of observation)
Sample standard deviation: s (in unit of observation)
Note the (n-1) in variance computation
(n-1) of the n differences are independent
Given (n-1) differences, the nth difference can be computed
Number of independent terms is the degrees of freedom (df)
CPSC 531: Data Analysis
11
Standard Deviation (SD)
Standard deviation and mean have same units
Preferred!
E.g. a) Mean = 2 s, SD = 2 s; high variability?
E.g. b) Mean = 2 s, SD = 0.2 s; low variability?
Another widely used measure – C.O.V
C.O.V = Ratio of standard deviation to mean
C.O.V does not have any units
C.O.V shows magnitude of variability
C.O.V in (a) is 1 and in (b) is .1
CPSC 531: Data Analysis
12
Percentiles, Quantiles, Quartiles
Lower and upper bounds expressed in percents
or as fractions
90-percentile →0.9-quantile
–quantile: sort and take [(n-1)+1]th observation
• [] means round to nearest integer
Quartiles divide data into parts at 25%, 50%,
75% → quartiles (Q1, Q2, Q3)
25% of the observations ≤ Q1 (the first quartlie)
Second quartile Q2 is also the median
The range (Q3 – Q1) is interquartile range
(Q3 – Q1)/2 is semi-interquartile (SIQR) range
CPSC 531: Data Analysis
13
Mean Absolute Deviation
Mean absolute deviation is calculated as:
1 n
xi x
n i 1
CPSC 531: Data Analysis
14
Influence of Outliers
Range: considerably
Sample variance: considerably, but less than
range
Mean absolute deviation: less than variance
Doesn’t square (aka magnify) the outliers
SIQR range: very resistant
Use SIQR for index of dispersion whenever
median is used as index of central tendency
CPSC 531: Data Analysis
15
Outline
Measures of Central Tendency
How to Summarize Variability?
Comparing Systems Using Sample Data
Sample vs. Population
Confidence Interval for Mean
Comparing Two Alternatives
Transient Removal
CPSC 531: Data Analysis
16
Comparing Systems Using Sample Data
The words “sample” and “example” have a
common root – “essample” (French)
One sample does not prove a theory - a sample
is just an example
The point is - definite statement cannot be
made about characteristics of all systems.
However, probabilistic statements about the
range of most systems can be made
Confidence interval concept as a building block
CPSC 531: Data Analysis
17
Sample versus Population
Generate 1-million random numbers
with mean and SD and put them in an urn
Draw sample of n observations
{x1, x2, …, xn} has mean , standard deviation s
x
x
is likely different than !
The population mean is unknown or impossible
to obtain in many real-world scenarios
obtain estimate of from
x
Therefore,
CPSC 531: Data Analysis
18
Confidence Interval for the Mean
Define bounds c1 and c2 such that:
Prob{c1 < < c2} = 1-
(c1, c2) is confidence interval
is significance level
100(1- ) is confidence level
Typically small desired
confidence level 90%, 95% or 99%
One approach: take k samples, find sample
means, sort, and take the [1+0.05(k-1)]th as
c1 and [1+0.95(k-1)]th as c2
CPSC 531: Data Analysis
19
Central Limit Theorem
We do not need many samples. Confidence
intervals can be determined from one sample
because ~ N(, /sqrt(n))
SD of sample mean /sqrt(n) called
Standard error
Using the CLT, a 100(1- )% confidence
interval for a population mean is
( -z1-/2s/sqrt(n), +z1-/2s/sqrt(n))
x
x
x
z1-/2
is the (1-/2)-quantile of a unit normal
variate (and is obtained from a table!)
s is the sample SD
CPSC 531: Data Analysis
20
Confidence Interval Example
CPU times obtained by repeating experiment
32 times. The sorted set consists of
{1.9,2.7,2.8,2.8,2.8,2.9,3.1,3.1,3.2,3.2,3.3,3.4,3.6,3.7,3.8,3.9,3.9
,4.1,4.1,4.2,4.2,4.4,4.5,4.5,4.8,4.9,5.1,5.1,5.3,5.6,5.9}
Mean = 3.9, standard deviation (s) = 0.95, n=32
For 90% confidence interval z1-/2 = 1.645, and
we get {3.90 + (1.645)(0.95)/(sqrt(32))} =
(3.62,4.17)
CPSC 531: Data Analysis
21
Meaning of Confidence Interval
What does this mean? With 90% confidence,
we can say population mean is within the above
bounds; that is, chance of error is 10%.
E.g., Take 100 samples and construct CI’s. In 10
cases, the interval will not contain population mean
x
-c
x
x
+c
90% chance that this interval contains
CPSC 531: Data Analysis
22
Length of Confidence Interval
Let z1-/2s/sqrt(n) = c
Then, z1-/2 = (c.sqrt(n))/s
Larger s implies wider confidence interval
Larger n implies shorter confidence interval
• → with more observations, we are better able to predict
population mean
• → square-root n relationship implies increasing
observations by a factor of 4 only cuts confidence interval
by a factor of 2.
Confidence Interval computation, as described
here works for n ≥ 30.
CPSC 531: Data Analysis
23
What if n not large?
For smaller samples, can construct confidence
intervals only if observations come from
normally distributed population
x t[1 / 2;n1]s /
n , x t[1 / 2;n1]s / n
t[1-α/2;n-1]
is the (1-α/2)-quantile of a t-variate with
(n-1) degrees of freedom
CPSC 531: Data Analysis
24
Testing for a Zero Mean
Check if measured value is significantly
different than zero
Determine confidence interval
Then check if zero is inside interval.
Procedure applicable to any other value a
mean
0
Mean is zero
Mean is nonzero
CPSC 531: Data Analysis
25
Outline
Measures of Central Tendency
How to Summarize Variability?
Comparing Systems Using Sample Data
Comparing Two Alternatives
Transient Removal
CPSC 531: Data Analysis
26
Comparing Two Alternatives
Often interested in comparing systems
“naïve” VOD vs. “batching” VOD (assignment 3)
“SJF” vs. “FIFO” request scheduling (assignment 1)
Statistical techniques for such comparison:
Paired Observations
Unpaired Observations (we will omit this!)
Approximate Visual Test
Did you use any of these in your assignments?
CPSC 531: Data Analysis
27
Paired Observations (1)
n experiments with one-to-one corrsp. between
test on system A and test on system B
no correspondence => unpaired
This test uses the zero mean idea…
Treat the two samples as one sample of n pairs
For each pair, compute difference
Construct confidence interval for difference
CI includes zero => systems not significantly
different
CPSC 531: Data Analysis
28
Paired Observations (2)
Six similar workloads used on two systems.
{(5.4, 19.1), (16.6, 3.5), (0.6,3.4), (1.4,2.5), (0.6,
3.6) (7.3, 1.7)} Is one system better?
The performance differences are
{-13.7, 13.1, -2.8, -1.1, -3.0, 5.6}
Sample mean = -.32, sample SD = 9.03
CI = -0.32 + t[sqrt(81.62/6)] = -0.32 + t(3.69)
.95 quantile of t with 5 DF’s is 2.015
90% confidence interval = (-7.75, 7.11)
Systems not different as zero mean in CI
CPSC 531: Data Analysis
29
Approximate Visual Test
Compute confidence interval for means
If CI’s don’t overlap, one system better than
the other
mean
mean
CI’s do not overlap =>
alternatives different
mean
CI’s overlap and mean
of one is in the CI of
the other =>
not significantly diff.
CI’s overlap but mean
of one is not in the
CI of the other =>
need more testing
CPSC 531: Data Analysis
30
Determining Sample Size
Goal: find the smallest sample size n such that desired
confidence in the results
Method:
small set of preliminary measurements
estimate variance from the measurements
use estimate to determine sample size for accuracy
r% accuracy=> +r% at 100(1-)% confidence
r
xz
x 1
100
n
s
100zs
n
rx
2
CPSC 531: Data Analysis
31
Outline
Measures of Central Tendency
How to Summarize Variability?
Comparing Systems Using Sample Data
Comparing Two Alternatives
Transient Removal
CPSC 531: Data Analysis
32
Transient Removal
In many simulations, we are interested in
steady state performance
Remove
initial transient state
However, defining exactly what constitutes
end of transient state is difficult!
Several heuristics developed:
Long runs
Proper initialization
Truncation
Initial data deletion
Moving average of replications
Batch means
CPSC 531: Data Analysis
33
Long Runs
Use very long runs
Impact of transient state becomes negligible
Wasteful use of resources
How long is “long enough”?
Raj Jain text recommends that this method
not be used in isolation
CPSC 531: Data Analysis
34
Batch Means
Run simulation for long
duration
Divide observations (N) into
m batches, each of size n
Compute variance of batch
means using procedure shown
for n = 2, 3, 4, 5 …
Plot variance vs. batch size
Ignore
1) Computebatch mean
1 n
xi xij , i 1,2,...,m
n i 1
2)Computeoverallmean
1 m
x xi
m i 1
3) Computevarianceof batch means
1 m
2
Var ( x )
( xi x )
m 1 i 1
Variance of
Batch means
Transient
interval
Batch Size n
CPSC 531: Data Analysis
35