Transcript Chapter 14

Confidence intervals:
The basics
BPS chapter 14
© 2006 W.H. Freeman and Company
Objectives (BPS chapter 14)
Confidence intervals: the basics

Estimating with confidence

Confidence intervals for the mean 

How confidence intervals behave

Choosing the sample size
Estimating with confidence
Although the sample mean xBar is a unique number for any particular
sample, if you pick a different sample, you will probably get a different
sample mean.
In fact, you could get many different values for the sample mean, and
virtually none of them would actually equal the true population mean, .
But the sampling distribution is narrower than the population
distribution, by a factor of √n.
n
Sample means,
n subjects
Thus, the estimates
x
x

gained from our samples
are highly likely to be
n
Population, x
individual subjects
close to
the population

parameter µ.

How can we quantify this?
If the population is normally distributed N(µ,σ),

then the sampling distribution is N(µ,σ/√n).
Example: CA SATM scores

Suppose we want to estimate the mean SATM score, , for all
California high school seniors.

We take SRS of 500 California seniors, and give them SATM.

Our sample mean SATM score turns out to be xBar = 461.



Suppose we somehow know (past experience?) that  = 100 is a
reasonable estimate of the standard deviation of the SATM score
among all CA seniors.
We’d like to know , the true (population) mean SATM score
among all CA seniors.
How well does the sample mean xBar = 461 estimate the
(unknown) parameter  ?
Start by reminding yourself of the basics
Example: CA SATM scores
Population:
All CA high-school seniors
Sample:
The n = 500 CA high-school seniors that we tested
Population variable:
X = SATM score
Sample mean:
x
= mean of the 500 sample SATM scores ( = 461)
Distribution of sample mean:

100


x ~ N   x    ,  x  

 4.5 
n
500


Example: CA SATM scores
Question: how likely is it that x is within z* = 2 standard
deviations of the (unknown) mean ?
Answer: by the Empirical rule, the liklihood is about 95%!
Note: how much does 2 standard deviations amount to in this
case?
2 x   2

n
 2  4.5  9
This is called the
margin of error m
So: we are 95% confident that the true (but unknown) mean 
differs from x  461 by no more than 9 points.
Lingo: The 95% confidence interval for  is (452, 470)!
Example: CA SATM scores

We have just computed a
95% confidence interval for 

The endpoints of the interval (452, 470) depended on
three values:

The point estimate xBar for μ.

The level of confidence C, which determined how many standard
deviations of xBar to use.

The standard deviation of the point estimate,
which also depends on the sample size, n.
X 


n

100
 4.5
500
The interval also depended on the fact that we knew that
the sampling distribution of xBar is normal (or at least
approximately normal, since n is large)!
Example: CA SATM scores
What Does It Mean?:
Suppose we took another sample of 500 CA high-school seniors,
gave them the SATM test, and computed the sample average x
In all liklihood we’d get a different sample mean. Say we found
that x  468
If we re-did the confidence interval calculation, we’d still get the
same margin of error m = 9.
The new 95% confidence interval would be (459, 477).
Every sample would result in a different confidence interval.
But we know that 95% of these confidence intervals
would contain the true (but unknown) mean .
We don’t know that our particular interval containes .
But we’re 95% confident that it does!

n

We are 95% confident that the
true population mean is
contained in the confidence
interval. WHY? Because
according to the Central Limit
Theorem, there is a 95% chance
that the estimate of the
population mean (i.e. the sample
mean), will be within 2 standard
deviations (of the sampling
distribution!) of the true
population mean.
Let’s try playing with the
“Confidence Interval” applet on
the stats portal….
Page 347
Implications
We don’t need to take lots of
random samples to “rebuild” the
sampling distribution and find 
at its center.
n
All we need is one SRS of
Sample
size n, and relying on the
n
Population
properties of the sampling
distribution of the sample
mean to infer the

population mean .
Confidence interval (page 346)
A level C confidence interval for xBar has 2 parts:

An interval calculated from the data, usually of the form
xBar  margin of error

A confidence level C, which gives the probability that the interval will capture the
true parameter value in repeated samples, or the success rate for the method.
Changing the Confidence Level
Suppose we wanted an 80% confidence interval. (C = 0.80)
Then the distance between xBar and  must be within the margin of error 80%
of the time (i.e. for 80% of all samples xBar)
Go to the standard
normal….
80% in
here
-m
+m
Standard normal density
10% in
here
So z* = invNorm(0.90) = 1.282 !
80% in
here
- z*
0
z*
Changing the Confidence Level
Suppose we wanted a level C confidence interval.
Then the distance between xBar and  must be within the margin of error C%
of the time (i.e. for C% of all samples xBar)
Go to the standard
normal….
C% in
here
-m
+m
Standard normal density
(1-C)/2%
in here
So in general
C% in
here
- z*
0
z* = invNorm((1+C)/2) !
z*
The book calls z* a critical value
From z* to the margin of error m
z* is the number of standard deviations of xBar needed to obtain
the confidence level C.
So the margin of error is z* times the standard deviation of xBar.
m  z * x   z *
where
   X 

n
Population variable
n  sample size
Let’s find z* for various C-levels. (see page 349)
Standard Normal
Density Curve
CLevel
99%
Value of
95%
1.960
??
90%
1.645
??
85%
1.440
??
Percentile = (1-C)/2 + C = (1+C)/2
TI8x command: z* = invNorm((C+1)/2)
z
*
2.576
??
Example: CA SATM scores (continued)
What if we needed a 98% confidence interval for , the true mean SATM score
for all CA high-school seniors?
Confidence Level
C = 0.98
Margin of Error
z* = invNorm(1.98/2) = invNorm(0.99) = 2.326 (rounded)
100
m  z  x   2.326 
 10.404
500
*
98% Confidence Interval
x - m, x  m  461 -10.4,461 10.4  450.6,471.4
Link between confidence level and margin of error
The confidence level C determines the value of z* (in Table C).
The margin of error also depends on z*.
m  z *
n
Higher confidence C implies a larger
margin of error m

C
A lower confidence level C produces a
smaller margin of error m
m
−Z*

x
m
Z*
Impact of sample size
The spread in the sampling distribution of the mean is a function of the
number of individuals per sample.
 The larger the sample size, the smaller
the standard deviation (spread) of the
sample mean distribution.
Standard error  ⁄ √n
 But the spread only decreases at a rate
equal to √n.
Sample size n


Goals for Estimating Population Parameters:

High Confidence

Low Margin of Error
How to Reduce the Margin of Error m  z

Change C-Level?
Lower C-Level Results in Smaller Value of

*

n
z* .
Change Sample Size?
Larger n will reduce m since you will divide by a larger value.

Change Population Standard Deviation?
Smaller
 will reduce m.
This is usually not be possible to do.
Suppose we want a certain margin of error for our confidence interval for a
population mean. What sample size will be needed to get the desired
result? (Assume we know the population standard deviation.)
mz
*

Let’s solve this equation for n!!
n
Page 355
Example: CA SATM scores (continued)
How large a sample size would we need in order to reduce the margin of error
for the 98% confidence interval to plus or minus 4 SATM points?
What we know:
m = 4,
z* = 2.326 (for C = 0.98)
 = 100
Find n:
z
n  
 m
*
2
  2.326 100 
  
  3382.43
4

 
2
So we need 3383 samples.
Sample size and experimental design
You may need a certain margin of error (e.g., drug trial, manufacturing
specs). In many cases, the population variability ( is fixed, but we can
choose the number of measurements (n).
So plan ahead what sample size to use to achieve that margin of error.
m  z*

n
z *  
n  

 m 
2

Remember, though, that sample size is not always stretchable at will. There are
typically
 costs and constraints associated with large samples. The best
approach is to use the smallest sample size that can give you useful results.