Confidence Intervals - Sara McLaughlin Mitchell

Download Report

Transcript Confidence Intervals - Sara McLaughlin Mitchell

Confidence Intervals
W&W, Chapter 8
Confidence Intervals
Although on average, M (the sample mean)
is on target (or unbiased), the specific
sample mean that we happen to observe
is almost certain to be a bit high or a bit
low. Accordingly, if we want to be
reasonably confident that our inference is
correct, we cannot claim that  is
precisely equal to M. Instead we must
construct a confidence interval of the
form:
 = M +/- sampling error
Interpretation
The confidence interval gives us a range
within which we are confident that the
true  falls.
Recall from Chapter 1 that we estimated a
confidence interval for a proportion,
which was:
 = P  1.96 [P(1-P)/n]
The Critical Z score
The value 1.96 comes from a confidence
level of 95%, where we assume that 95
out of every 100 samples we collect will
contain the true population parameter.
Recall from the standard normal
distribution that 95% of all values fall
within 2 standard deviations of the mean.
The exact Z-score for 95% is +/- 1.96.
Confidence Interval for the
Mean
 = M +/- Z/2(S.E.)
where Z is the critical value based on
 = 1 - confidence level (as a
proportion)
S.E. of the sampling distribution =
/N
Confidence Interval for the
Mean
For 95%,  = 1 - .95 = .05; /2 = .025;
thus we go to the standard normal table
and find the value that has a probability
of .025 beyond it. This is 1.96!
Thus a 95% confidence interval for the
mean is calculated as:
 = M +/- 1.96(/N)
Confidence Interval for the
Mean
Note: As N increases, the size of the confidence
interval shrinks, i.e., we are more confident that
we are close to  with a larger sample.
What if we wanted to be 99% confident that our confidence
interval contains the true ? In other words, in the long
run, 99 out of 100 samples would produce a confidence
interval that contains .
 = 1 - .99 = .01; /2 = .005
Z/2 = Z.005 = 2.58
 = M +/- 2.58(/N)
Unknown 
So far we have assumed that the
population standard deviation, , was
known. Most of the time, this is not the
case, so we will use the sample standard
deviation, s, as a proxy for .
Confidence interval when  is unknown:
 = M +/- Z/2(s/N)
Example
Example: We want to determine the mean
number of years that FSU professors hold
their jobs. We take a random sample of
100 professors and calculate the sample
mean, which is 14.7 and the sample
standard deviation, s = 2.5. What is the
95% confidence interval for the true
average number of years FSU professors
have held their jobs?
Example
 = M +/- Z/2(s/N)
= 14.7 +/- (1.96)(2.5/100)
= 14.7 +/- .49
Thus we are 95% confident that the
mean number of years FSU
professors have kept their jobs is
between 14.2 and 15.2 years.
Effect of Increasing N
If we had collected this data from a
sample of 500 professors, our
interval estimate would shrink to
14.7 +/- .22 or 14.5 to 14.9 years,
which demonstrates how the
confidence interval is decreased by
larger samples.
When Z is appropriate
We have been using Z-scores from the
standard normal distribution to construct
our critical values. The use of Z-scores,
however, is only appropriate if  is known
or if we have a reasonably sized sample
(N100) when  is unknown. In cases of
small samples when  is not known, we
must use the student t distribution.
Student t Distribution
A man named William Gosset, who wrote
under the name of Student, developed
the t distribution to deal with the nature
of small samples. He was working for the
Guinness brewing company in Ireland in
the early 1900's (article published in
1908). He wanted to construct a reliable
test for examining the quality of a small
sample of beer and from that, be able to
conclude that the whole batch of beer
was ok.
Characteristics of the
Student t Distribution
Like the standard normal, it is
symmetric with a mean of zero
 The larger tails imply that the
proportion of area beyond a specific
value of t is greater than the
proportion of area beyond the
corresponding Z

Characteristics of the
Student t Distribution
The t critical points are presented relative
to the degrees of freedom, which is n - #
of things not free to vary. For the
standard deviation, s,
S = (Xi - M)2
N-1

Thus the degrees of freedom here is n - 1
(later we will see that df = n - k, where k
is the number of parameters we are
estimating).
Example
The maker of a certain car model claims
that the car averages 31 miles per gallon.
A random sample of nine cars is selected
and each car is driven on a tank of gas.
The sample mean mpg is 29.5 and the
sample standard deviation (s) is 3. Can
we be 95% confident that the car actually
gets 31mpg from this sample?
Example
We must use t because 1)  is unknown
and 2) n is small (less than 100).
= M +/- t/2(s/n)
/2 = .05/2 = .025
df = n - 1 = 8 (go to t table)



= 29.5 +/- 2.31(3/9)
= 29.5 +/- 2.31 = 27.2 to 31.8
Interpretation
Yes, we can be 95% confident that
the car gets 31mpg.