Transcript Slide 1

 Section 7.2
How Can We Construct a
Confidence Interval to Estimate a
Population Proportion?
Agresti/Franklin Statistics, 1 of 87
Finding the 95% Confidence Interval
for a Population Proportion



We symbolize a population proportion by p
The point estimate of the population
proportion is the sample proportion
We symbolize the sample proportion by pˆ
Agresti/Franklin Statistics, 2 of 87
Finding the 95% Confidence Interval
for a Population Proportion


A 95% confidence interval uses a margin of
error = 1.96(standard errors)
[point estimate ± margin of error] =
pˆ  1.96(standard errors)
Agresti/Franklin Statistics, 3 of 87
Finding the 95% Confidence Interval
for a Population Proportion

The exact standard error of a sample proportion
equals:
p(1  p)
n


This formula depends on the unknown population
proportion, p
In practice, we don’t know p, and we need to
estimate the standard error
Agresti/Franklin Statistics, 4 of 87
Finding the 95% Confidence Interval
for a Population Proportion

In practice, we use an estimated standard
error:
se 
p
ˆ (1  p
ˆ)
n
Agresti/Franklin Statistics, 5 of 87
Finding the 95% Confidence Interval
for a Population Proportion

A 95% confidence interval for a population
proportion p is:
ˆ  1.96(se), with se 
p
ˆ (1 - p
ˆ)
p
n
Agresti/Franklin Statistics, 6 of 87
Example: Would You Pay Higher
Prices to Protect the Environment?

In 2000, the GSS asked: “Are you
willing to pay much higher prices in
order to protect the environment?”
• Of n = 1154 respondents, 518 were
willing to do so
Agresti/Franklin Statistics, 7 of 87
Example: Would You Pay Higher
Prices to Protect the Environment?

Find and interpret a 95% confidence
interval for the population proportion
of adult Americans willing to do so at
the time of the survey
Agresti/Franklin Statistics, 8 of 87
Example: Would You Pay Higher
Prices to Protect the Environment?
518
pˆ 
 0.45
1154
(0.45)(0.55)
se 
 0.015
1154
pˆ  1.96(se) 1.96(0.015)
 0.45 0.03 (0.42,0.48)
Agresti/Franklin Statistics, 9 of 87
Sample Size Needed for Large-Sample
Confidence Interval for a Proportion

For the 95% confidence interval for a
proportion p to be valid, you should have at
least 15 successes and 15 failures:
ˆ )  15
np
ˆ  15 and n(1 - p
Agresti/Franklin Statistics, 10 of 87
“95% Confidence”


With probability 0.95, a sample
proportion value occurs such that the
confidence interval contains the
population proportion, p
With probability 0.05, the method
produces a confidence interval that
misses p
Agresti/Franklin Statistics, 11 of 87
How Can We Use Confidence
Levels Other than 95%?



In practice, the confidence level 0.95
is the most common choice
But, some applications require
greater confidence
To increase the chance of a correct
inference, we use a larger confidence
level, such as 0.99
Agresti/Franklin Statistics, 12 of 87
A 99% Confidence Interval for p
pˆ  2.58(se)
Agresti/Franklin Statistics, 13 of 87
Different Confidence Levels
Agresti/Franklin Statistics, 14 of 87
Different Confidence Levels

In using confidence intervals, we
must compromise between the
desired margin of error and the
desired confidence of a correct
inference
• As the desired confidence level
increases, the margin of error gets
larger
Agresti/Franklin Statistics, 15 of 87
What is the Error Probability for
the Confidence Interval Method?

The general formula for the confidence
interval for a population proportion is:
Sample proportion ± (z-score)(std. error)
which in symbols is
pˆ  z(se)
Agresti/Franklin Statistics, 16 of 87
What is the Error Probability for
the Confidence Interval Method?
Agresti/Franklin Statistics, 17 of 87
Summary: Confidence Interval
for a Population Proportion, p

A confidence interval for a population
proportion p is:
ˆ z
p
ˆ (1 - p
ˆ)
p
n
Agresti/Franklin Statistics, 18 of 87
Summary: Effects of Confidence
Level and Sample Size on Margin of
Error

The margin of error for a confidence
interval:
• Increases as the confidence level
increases
• Decreases as the sample size
increases
Agresti/Franklin Statistics, 19 of 87
What Does It Mean to Say that
We Have “95% Confidence”?

If we used the 95% confidence
interval method to estimate many
population proportions, then in the
long run about 95% of those intervals
would give correct results, containing
the population proportion
Agresti/Franklin Statistics, 20 of 87
 Section 7.3
How Can We Construct a
Confidence Interval To Estimate a
Population Mean?
Agresti/Franklin Statistics, 21 of 87
How to Construct a Confidence
Interval for a Population Mean




Point estimate ± margin of error
The sample mean is the point
estimate of the population mean
The exact standard error of the
sample mean is σ/ n
In practice, we estimate σ by the
sample standard deviation, s
Agresti/Franklin Statistics, 22 of 87
How to Construct a Confidence
Interval for a Population Mean



For large n…
•
and also
For small n from an underlying population
that is normal…
The confidence interval for the population
mean is:
x  z(

n
)
Agresti/Franklin Statistics, 23 of 87
How to Construct a Confidence
Interval for a Population Mean



In practice, we don’t know the
population standard deviation
Substituting the sample standard
deviation s for σ to get se = s/ n
introduces extra error
To account for this increased error,
we replace the z-score by a slightly
larger score, the t-score
Agresti/Franklin Statistics, 24 of 87
How to Construct a Confidence
Interval for a Population Mean


In practice, we estimate the standard
error of the sample mean by se = s/ n
Then, we multiply se by a t-score from
the t-distribution to get the margin of
error for a confidence interval for the
population mean
Agresti/Franklin Statistics, 25 of 87
Properties of the t-distribution



The t-distribution is bell shaped and
symmetric about 0
The probabilities depend on the
degrees of freedom, df
The t-distribution has thicker tails and
is more spread out than the standard
normal distribution
Agresti/Franklin Statistics, 26 of 87
t-Distribution
Agresti/Franklin Statistics, 27 of 87
Summary: 95% Confidence
Interval for a Population Mean

A 95% confidence interval for the
population mean µ is:
s
x  t ( ); df  n - 1
n
.025

To use this method, you need:
•
•
Data obtained by randomization
An approximately normal population distribution
Agresti/Franklin Statistics, 28 of 87
Example: eBay Auctions of
Palm Handheld Computers

Do you tend to get a higher, or a
lower, price if you give bidders the
“buy-it-now” option?
Agresti/Franklin Statistics, 29 of 87
Example: eBay Auctions of
Palm Handheld Computers


Consider some data from sales of the
Palm M515 PDA (personal digital
assistant)
During the first week of May 2003, 25
of these handheld computers were
auctioned off, 7 of which had the
“buy-it-now” option
Agresti/Franklin Statistics, 30 of 87
Example: eBay Auctions of
Palm Handheld Computers

“Buy-it-now” option:
235 225 225 240 250 250 210

Bidding only:
250 249 255 200 199 240 228
255 232 246 210 178 246 240
245 225 246 225
Agresti/Franklin Statistics, 31 of 87
Example: eBay Auctions of
Palm Handheld Computers

Summary of selling prices for the two
types of auctions:
buy_now N Mean StDev
no
18 231.61 21.94
yes
7 233.57 14.64
buy_now Maximum
no
255.00
yes
250.00
Minimum Q1 Median
Q3
178.00 221.25 240.00 246.75
210.00 225.00 235.00 250.00
Agresti/Franklin Statistics, 32 of 87
Example: eBay Auctions of
Palm Handheld Computers
Agresti/Franklin Statistics, 33 of 87
Example: eBay Auctions of
Palm Handheld Computers

To construct a confidence interval
using the t-distribution, we must
assume a random sample from an
approximately normal population of
selling prices
Agresti/Franklin Statistics, 34 of 87
Example: eBay Auctions of
Palm Handheld Computers



Let µ denote the population mean for
the “buy-it-now” option
The estimate of µ is the sample mean:
x = $233.57
The sample standard deviation is:
s = $14.64
Agresti/Franklin Statistics, 35 of 87
Example: eBay Auctions of
Palm Handheld Computers

The 95% confidence interval for the “buy-itnow” option is:
s
14.64
x  t.025 ( )  233.57  2.44(
)
n
7

which is 233.57 ± 13.54 or (220.03, 247.11)
Agresti/Franklin Statistics, 36 of 87
Example: eBay Auctions of
Palm Handheld Computers

The 95% confidence interval for the
mean sales price for the bidding only
option is:
(220.70, 242.52)
Agresti/Franklin Statistics, 37 of 87
Example: eBay Auctions of
Palm Handheld Computers

Notice that the two intervals overlap
a great deal:
• “Buy-it-now”: (220.03, 247.11)
• Bidding only: (220.70, 242.52)

There is not enough information for us to
conclude that one probability distribution
clearly has a higher mean than the other
Agresti/Franklin Statistics, 38 of 87
How Do We Find a t- Confidence
Interval for Other Confidence
Levels?

The 95% confidence interval uses t.025
since 95% of the probability falls
between - t.025 and t.025

For 99% confidence, the error
probability is 0.01 with 0.005 in each
tail and the appropriate t-score is t.005
Agresti/Franklin Statistics, 39 of 87
If the Population is Not Normal,
is the Method “Robust”?


A basic assumption of the confidence
interval using the t-distribution is that
the population distribution is normal
Many variables have distributions that
are far from normal
Agresti/Franklin Statistics, 40 of 87
If the Population is Not Normal,
is the Method “Robust”?

How problematic is it if we use the tconfidence interval even if the
population distribution is not normal?
Agresti/Franklin Statistics, 41 of 87
If the Population is Not Normal,
is the Method “Robust”?


For large random samples, it’s not
problematic
The Central Limit Theorem applies:
for large n, the sampling distribution
is bell-shaped even when the
population is not
Agresti/Franklin Statistics, 42 of 87
If the Population is Not Normal,
is the Method “Robust”?



What about a confidence interval using the
t-distribution when n is small?
Even if the population distribution is not
normal, confidence intervals using t-scores
usually work quite well
We say the t-distribution is a robust method
in terms of the normality assumption
Agresti/Franklin Statistics, 43 of 87
Cases Where the t- Confidence
Interval Does Not Work

With binary data

With data that contain extreme
outliers
Agresti/Franklin Statistics, 44 of 87
The Standard Normal Distribution is
the t-Distribution with df = ∞
Agresti/Franklin Statistics, 45 of 87
The 2002 GSS asked: “What do you
think is the ideal number of children in
a family?”

a.
b.
c.
d.
The 497 females who responded had a median
of 2, mean of 3.02, and standard deviation of
1.81. What is the point estimate of the
population mean?
497
2
3.02
1.81
Agresti/Franklin Statistics, 46 of 87