Transcript Chapter 21
Chapter 21
What Is a Confidence Interval?
Chapter 21
1
Recall from previous chapters:
• Parameter
– fixed, unknown number that describes the population
• Statistic
– known value calculated from a sample
– a statistic is used to estimate a parameter
• Sampling Variability
– different samples from the same population may yield
different values of the sample statistic
– estimates from samples will be closer to the true
values in the population if the samples are larger
Chapter 21
2
Recall from previous chapters:
• Example:
– The amount by which the proportion obtained from the
sample (p̂) will differ from the true population
proportion (p) rarely exceeds the margin of error.
• Sampling Distribution
– tells what values a statistic takes and how often it
takes those values in repeated sampling.
• Example:
– sample proportions ( p̂ ’s) from repeated sampling
would have a normal distribution with a certain mean
and standard deviation.
Chapter 21
3
Overview
This chapter presents the beginning of inferential
statistics.
•
The two major applications of inferential statistics
involve the use of sample data to (1) estimate the value
of a population parameter, and (2) test some claim (or
hypothesis) about a population.
•
We introduce methods for estimating values of these
important population parameters: proportions and
means.
•
We also present methods for determining sample sizes
necessary to estimate those parameters.
Chapter 21
4
Definition
A point estimate is a single value
(or point) used to approximate a
population parameter.
Chapter 21
5
Definition
ˆ
The sample proportion p is the best
point estimate of the population
proportion p.
Chapter 21
6
Sampling Distribution of Sample
Proportions
If numerous simple random samples of size n
are taken from the same population, the sample
proportions ( p
ˆ ) from the various samples will
have an approximately normal distribution. The
mean of the sample proportions will be p (the
true population proportion). The standard
deviation will be:
p(1 p)
n
pq
, where q 1- p.
n
Chapter 21
7
Rule Conditions
• For the sampling distribution of the sample proportions
to be valid, we must have
Random
“Large”
samples
sample size
Chapter 21
8
Definition
A confidence interval (or interval
estimate) is a range (or an interval)
of values used to estimate the true
value of a population parameter. A
confidence interval is sometimes
abbreviated as CI.
Chapter 21
9
Confidence Interval for a
Population Proportion
• An interval of values, computed from
sample data, that is almost sure to cover
the true population proportion.
• “We are ‘highly confident’ that the true population
proportion is contained in the calculated interval.”
• Statistically (for a 95% C.I.): in repeated
samples, 95% of the calculated confidence
intervals should contain the true proportion.
Chapter 21
10
Chapter 21
11
Formula for a 95% Confidence
Interval for the Population
Proportion (Empirical Rule)
• sample proportion plus or minus two
standard deviations of
p( 1 p )
p̂ 2
the sample proportion:
n
• since we do not know the population
proportion p (needed to calculate the
standard deviation) we will use the
sample proportion p̂ in its place.
Chapter 21
12
Formula for a 95% Confidence
Interval for the Population
Proportion (Empirical Rule)
pˆ (1 pˆ )
pˆ 2
n
standard error (estimated standard deviation of p̂ )
Chapter 21
13
Margin of Error
2
p̂(1 p̂ )
(plus or minus part of C.I.)
n
2
0.5(1 0.5 )
n
Chapter 21
1
n
14
Formula for a C-level (%)
Confidence Interval for the
Population Proportion
pˆ (1 pˆ )
pˆ z *
n
where z* is the critical value of the standard
normal distribution for confidence level C
Chapter 21
15
Chapter 21
16
Table 21.1: Common Values of z*
Confidence Level
C
Critical Value
z*
50%
0.67
60%
0.84
68%
1
70%
1.04
80%
1.28
90%
1.64
95%
1.96 (or 2)
99%
2.58
99.7%
3
99.9%
3.29
Chapter 21
17
Example: 829 adult Nanaimo residents were
surveyed, and 51% of them are opposed to the
use of photo radars for issuing traffic tickets.
Using the survey results:
a) Find the margin of error E that corresponds to a
95% confidence level.
b) Find the 95% confidence interval estimate of the
population proportion p.
c) Based on the results, can we safely conclude
that the majority of adult Nanaimo residents
oppose use photo radars?
Chapter 21
18
Example: 829 adult Nanaimo residents were
surveyed, and 51% of them are opposed to the
use of the photo radar for issuing traffic tickets.
Using the survey results:
a) Find the margin of error E that
corresponds to a 95% confidence level.
Next, we calculate the margin of error. We have found
that p = 0.51, q = 1 – 0.51 = 0.49, z* = 1.96, and n = 829.
ˆ
ˆ
(0.51)(0.49)
E = 1.96
829
E = 0.03403
Chapter 21
19
Example: 829 adult Nanaimo residents were
surveyed, and 51% of them are opposed to the
use of the photo radars for issuing traffic tickets.
Using the survey results:
b) Find the 95% confidence interval for the
population proportion p.
We substitute our values from part a) to obtain:
0.51 – 0.03403 < p < 0.51 + 0.03403,
0.476 < p < 0.544
Chapter 21
20
Example: 829 adult Nanaimo residents were
surveyed, and 51% of them are opposed to the use of the
photo radars for issuing traffic tickets. Use these survey
results.
c) Based on the results, can we safely conclude that the
majority of adult Nanaimo residents oppose use of
the photo radars?
Based on the survey results, we are 95% confident that the limits
of 47.6% and 54.4% contain the true percentage of adult Nanaimo
residents opposed to the photo radar. The percentage of adult
Nanaimo residents who oppose the use of photo radars is likely
to be any value between 47.6% and 54.4%. However, a majority
requires a percentage greater than 50%, so we cannot safely
conclude that the majority is opposed (because the entire
confidence interval is not greater than 50%).
Chapter 21
21
Example: Page 444 #21.20
A random sample of 1500 adults finds that 60% favour
balancing the federal budget over cutting Taxes. Use
this poll result and Table 21.1 to give 70%, 80%, 90%,
and 99% confidence intervals for the proportion of all
adults who fell this way. What do your results show
about the effect of changing the confidence level?
Chapter 21
22
Sample Size
Suppose we want to collect sample data
with the objective of estimating some
population proportion. The question is
how many sample items must be obtained?
The required sample size depends on the
confidence level and the desired margin of
error.
Chapter 21
23
Determining Sample Size
p
ˆ qˆ
n
Z*
E=
(solve for n by algebra)
n=
( Z* )2 p
ˆ ˆq
E2
Chapter 21
Always round up.
24
Sample Size for Estimating
Proportion p
ˆ
When an estimate of p is known:
)2 pˆ qˆ
n = z*E 2
(
ˆ
When no estimate of p is known:
n=
(
Z*
)2 0.25
E2
Chapter 21
^ = 0.5
Use P
- the “worst case” value
25
Example: Suppose a sociologist wants to
determine the current percentage of Nanaimo
households using e-mail. How many households
must be surveyed in order to be 95% confident
that the sample percentage is in error by no more
than four percentage points?
a) Use this result from an earlier study:
In 1997, 16.9%
of Nanaimo households used e-mail.
b) Assume that we have no prior information
suggesting a possible value of ˆp.
Chapter 21
26
Example: Suppose a sociologist wants to determine
the
current percentage of Nanaimo households using e-mail.
How many households must be surveyed in order to be 95%
confident that the sample percentage is in error by no more
than four percentage points?
a) Use this result from an earlier study: In 1997, 16.9%
of Nanaimo households used e-mail.
ˆˆ
n = [z* ]2 p q
E2
= [1.96]2 (0.169)(0.831)
0.042
= 337.194
= 338 households
To be 95% confident that our
sample percentage is within four
percentage points of the true
percentage for all households, we
should randomly select and
survey 338 households.
Chapter 21
27
Example: Suppose a sociologist wants to determine
the current percentage of Nanaimo households using email. How many households must be surveyed in order
to be 95% confident that the sample percentage is in error
by no more than four percentage points?
b) Assume that we have no prior information suggesting a
possible value of p.
ˆ
n = [z* ]2 • 0.25
E2
= (1.96)2 (0.25)
0.042
= 600.25
= 601 households
With no prior information, we
need a larger sample to achieve
the same results with 95%
confidence and an error of no
more than 4%.
Chapter 21
28
Key Concepts (1st half of Ch. 21)
• Different samples (of the same size) will
generally give different results.
• We can specify what these results look
like in the aggregate.
• Rule for Sample Proportions
• Compute and interpret confidence
intervals for population proportions based
on sample proportions
Chapter 21
29
Inference for Population Means
Sampling Distribution, Confidence Intervals
• The remainder of this chapter discusses
the situation when interest is in making
conclusions about population means
rather than population proportions
– includes the rule for the sampling distribution
of sample means ( X's )
– includes confidence intervals for a mean
Chapter 21
30
Thought Question 1
(from Seeing Through Statistics, 2nd Edition, by Jessica M. Utts, p. 316)
Suppose the mean weight of all women at
a university is 135 pounds, with a standard
deviation of 10 pounds.
• Recalling the material from Chapter 13
about bell-shaped curves, in what range
would you expect 95% of the women’s
weights to fall? 115 to 155 pounds
Chapter 21
31
Thought Question 1 (cont.)
• If you were to randomly sample 10 women
at the university, how close do you think
their average weight would be to 135
pounds?
• If you randomly sample 1000 women,
would you expect the average to be closer
to 135 pounds than it would be for the
sample of 10 women?
Chapter 21
32
Thought Question 2
A study compared the serum HDL cholesterol
levels in people with low-fat diets to people with
diets high in fat intake. From the study, a 95%
confidence interval for the mean HDL cholesterol
for the low-fat group extends from 43.5 to 50.5...
a. Does this mean that 95% of all people with
low-fat diets will have HDL cholesterol levels
between 43.5 and 50.5? Explain.
Chapter 21
33
Thought Question 2 (cont.)
… a 95% confidence interval for the mean HDL
cholesterol for the low-fat group extends from
43.5 to 50.5. A 95% confidence interval for the
mean HDL cholesterol for the high-fat group
extends from 54.5 to 61.5.
b. Based on these results, would you conclude
that people with low-fat diets have lower
HDL cholesterol levels, on average, than
people with high-fat diets?
Chapter 21
34
Thought Question 3
The first confidence interval in Question 2
was based on results from 50 people. The
confidence interval spans a range of 7 units.
If the results had been based on a much
larger sample, would the confidence interval
for the mean cholesterol level have been
wider, more narrow, or about the same?
Explain.
Chapter 21
35
The Central Limit Theorem (CLT)
If simple random samples of size n (n large) are
taken from the same population, the sample
means (X ) from the various samples will have
an approximately normal distribution. The
mean of the sample means will be m (the
population mean). The standard deviation will
be:
( is the population s.d.)
n
Chapter 21
36
Chapter 21
37
Conditions for the Rule for
Sample Means
• Random sample
• Population of measurements…
– Follows a bell-shaped curve
- or – Not bell-shaped, but sample size is “large”
– When is not known, we use the sample standard
deviation to estimate . In this case, we use a t distribution to find the critical value if the underlying
distribution is normal.
Chapter 21
38
Case Study: Weights
Sampling Distribution
(for n = 10)
μ 135 lb ( mean for population and X)
σ 10 lb
( s.d. for population)
n 10
σ
n
10
10
3.16 ( s.d. for X)
Chapter 21
39
Case Study: Weights
Answer to Question
(for n = 10)
• Where should 95% of the sample mean
weights fall (from samples of size n=10)?
mean plus or minus two standard deviations
135 2(3.16) = 128.68
135 + 2(3.16) = 141.32
95% should fall between 128.68 lb & 141.32 lb
Chapter 21
40
Case Study: Weights
Sampling Distribution
(for n = 25)
μ 135 lb
σ 10 lb
n 25
10
2
25
Chapter 21
41
Case Study: Weights
Answer to Question
(for n = 25)
• Where should 95% of the sample mean
weights fall (from samples of size n=25)?
mean plus or minus two standard deviations
135 2(2) = 131
135 + 2(2) = 139
95% should fall between 131 lb & 139 lb
Chapter 21
42
0
Chapter 21
150.0000
148.5000
147.0000
145.5000
144.0000
142.5000
141.0000
139.5000
138.0000
136.5000
135.0000
133.5000
132.0000
130.5000
129.0000
127.5000
126.0000
124.5000
123.0000
121.5000
120
Sampling Distribution of Mean (n=25)
Simulated Data: Sample Size=25
200
150
100
50
Sam ple Means
43
Case Study: Weights
Sampling Distribution
(for n = 100)
μ 135 lb
σ 10 lb
n 100
10
1
100
Chapter 21
44
Case Study: Weights
Answer to Question
(for n = 100)
• Where should 95% of the sample mean
weights fall (from samples of size n=100)?
mean plus or minus two standard deviations
135 2(1) = 133
135 + 2(1) = 137
95% should fall between 133 lb & 137 lb
Chapter 21
45
Chapter 21
150.0000
148.5000
147.0000
145.5000
144.0000
142.5000
141.0000
139.5000
138.0000
136.5000
135.0000
133.5000
132.0000
130.5000
129.0000
127.5000
126.0000
124.5000
123.0000
121.5000
120
Sampling Distribution of Mean (n=100)
Simulated Data: Sample Size=100
200
150
100
50
0
Sam ple Means
46
Case Study
Exercise and Pulse Rates
Hypothetical
Is the mean resting pulse rate of adult
subjects who regularly exercise different
from the mean resting pulse rate of those
who do not regularly exercise?
Find Confidence Intervals for the means
Chapter 21
47
Case Study: Results
Exercise and Pulse Rates
A random sample of n1=29 exercisers yielded a sample
mean of X1=66 beats per minute (bpm) with a sample
standard deviation of s1=8.6 bpm. A random sample of
n2=31 nonexercisers yielded a sample mean of X 2 =75
bpm with a sample standard deviation of s2=9.0 bpm.
Exercisers
Nonexercisers
n mean std. dev.
29
66
8.6
31
75
9.0
Chapter 21
48
The Rule for Sample Means
We do not know the value of !
We assume that the pulse rates are normally
distributed.
We need to use the t – distribution to find the
critical value.
For large samples, we can use the normal
distribution as an approximation.
Chapter 21
49
Standard Error of the
(Sample) Mean
SEM = standard error of the mean
=
(standard deviation from the sample)
divided by
(square root of the sample size)
=
s
n
Chapter 21
50
Case Study: Results
Exercise and Pulse Rates
Exercisers
Nonexer.
n mean std. dev.
29
66
8.6
31
75
9.0
std. err.
1.6
1.6
Typical
deviation of an individual pulse rate
(for Exercisers) is s = 8.6
Typical deviation of a mean pulse rate
(for Exercisers) is s
= 1.6
8.6
n
Chapter 21
29
51
Case Study: Confidence
Intervals
Exercise and Pulse Rates
95%
C.I. for the population mean:
sample mean z* (standard error)
X2
s
n
66 ± 2(1.6) = 66 3.2 = (62.8, 69.2)
Non-exercisers: 75 ± 2(1.6) = 75 3.2 = (71.8, 78.2)
Do you think the population means are different?
Exercisers:
Yes, because the intervals do not overlap
Chapter 21
52
Careful Interpretation of a
Confidence Interval
• “We are 95% confident that the mean resting pulse rate
for the population of all exercisers is between 62.8 and
69.2 bpm.” (We feel that plausible values for the population of
exercisers’ mean resting pulse rate are between 62.8 and 69.2.)
• ** This does not mean that 95% of all people who exercise
regularly will have resting pulse rates between 62.8 and
69.2 bpm. **
• Statistically: 95% of all samples of size 29 from the population of
exercisers should yield a sample mean within two standard
errors of the population mean; i.e., in repeated samples, 95% of
the C.I.s should contain the true population mean.
Chapter 21
53
Case Study: Confidence
Intervals
Exercise and Pulse Rates
95%
C.I. for the difference in population
means (nonexercisers minus exercisers):
(difference in sample means)
2 (SE of the difference)
Difference in sample means: XN XE = 9
SE of the difference = 2.26 (given)
95% confidence interval: (4.4, 13.6)
– interval does not include zero ( means are different)
Chapter 21
54
Example: Page 445 # 21.26
The NAEP test (Example 7, page 439) was given to a sample
of 1077 women of ages 21 to 25 years. Their mean quantitative
score was 275 and the standard deviation was 58.
a) Give a 95% confidence interval for the mean score μ in the population of
all young women.
Chapter 21
55
Example: Page 445 # 21.26
Cont.
The NAEP test (Example 7, page 439) was given to a sample
of 1077 women of ages 21 to 25 years. Their mean quantitative
score was 275 and the standard deviation was 58.
b) Give the 90% and 99% confidence intervals for μ.
Chapter 21
56
Example: Page 445 # 21.26
Cont.
The NAEP test (Example 7, page 439) was given to a sample
of 1077 women of ages 21 to 25 years. Their mean quantitative
score was 275 and the standard deviation was 58.
c) What are the margins of error for 90%, 95%, and 99% confidence? How
does increasing the confidence level affect the margin of error of a
confidence interval?
Chapter 21
57
Sample Size for Estimating Mean m
n=
(z*)
2
E
Where
Z* = critical z score based on the desired confidence level
E = desired margin of error
σ = population standard deviation
Chapter 21
58
Example: Assume that we want to estimate the
mean IQ score for the population of statistics
professors. How many statistics professors must
be randomly selected for IQ tests if we want 95%
confidence that the sample mean is within 2 IQ
points of the population mean? Assume that =
15, as is found in the general population.
Chapter 21
59
Key Concepts (2nd half of Ch. 21)
• Sampling distribution of sample Means
• The Central Limit Theorem
• Compute confidence intervals for means
• Interpret Confidence Intervals for Means
Chapter 21
60