Transcript PowerPoint

STATISTICS 200
Lecture #17
Tuesday, October 18, 2016
Textbook: Sections 9.5, 10.3, 10.4
Objectives:
• Apply general confidence interval formula: Estimate
plus/minus (multiplier × standard error)
• Calculate new values of the multiplier for new confidence
levels other than 95%
• Interpret confidence level as a relative frequency
• Describe the sampling distribution of a difference of two
independent sample proportions.
• Apply formula for the S.E. of a difference
We have begun a strong focus on
Inference
Means
Proportions
One
population
proportion
Two
population
proportions
One
population
mean
Difference
between
Means
This week
Mean
difference
Motivation
Goal: Use statistical inference to answer the
question “What is the percentage of Creamery
customers who prefer chocolate ice cream over
vanilla?”
Strategy:
Get a random sample of
90 individuals and ask
them this question. Use
the answers to perform a
hypothesis test to answer
the question.
Motivation
Goal: Use statistical inference to answer the
question “What is the percentage of Creamery
customers who prefer chocolate ice cream over
vanilla?”
Data: Of 90 respondents in
our representative sample,
35 said they prefer
chocolate.
Let’s create a 90%
confidence interval for the
true percentage.
Our new confidence interval formula
Here, “estimate”
means p-hat.
(estimate – ME to estimate + ME)
ME = (multiplier)*(standard error)
Our new confidence interval formula
Putting it all together, we get
What does it mean to be 90% confident?
A. There is a 90% probability that the one interval that
I calculated contains the true value for the
parameter.
B. If I get 100 such intervals, about 90 of them will
contain the true value for the parameter.
C. The sample estimate has a 90% chance of being
inside the calculated interval.
D. The p-value has a 90% chance of being inside the
interval.
Recall the example from Thursday
Suppose we have a sample of 200 students in
STAT 100 and find that 28 of them are left
handed. Find a 95% CI for the true
proportion.
Our sample proportion is:
Our ME is
Our 95% CI is
On the following two slides, we'll pretend that
the true population proportion is 0.12.
Normal curve of sample proportions The green curve is the
based on sample size 200
true distribution of p-
hat.
Of course, ordinarily
we don't know where it
lies, but at least we
know its approximate
standard deviation.
Thus, we can build a
confidence interval
around our 14%
estimate (in red).
0.08
0.10
0.12
0.14
0.16
0.18
sample percents
If we take another sample, the red line will move
but the green curve will not!
30 confidence intervals
based on sample size 200
If we repeat the
sampling over
and over, 95% of
our confidence
intervals will
contain the true
proportion of
0.12.
This is why we
use the term
"95% confidence
interval".
0.06
0.08
0.10
0.12
0.14
0.16
sample percents
0.18
Definition of "95% confidence interval for
the true population proportion":
An interval of values computed from a
sample that will cover the true but
unknown population proportion for 95% of
the possible samples.
To find a 95% CI:
• The center is at p-hat.
• The margin of error is 2 times the S.E., where…
• …the S.E. is the square root of [p-hat(1-p-hat)/n].
Recall this example:
Are women more likely to have dogs?
Female
Male
Total
Has Dog
89
56.7%
66
50.8%
155
No Dog
68
43.3%
64
49.2%
132
Total
157
130
287
Your class data
Recall this example:
Are women more likely to have dogs?
Female
Male
Total
Has Dog
89
56.7%
66
50.8%
155
No Dog
68
43.3%
64
49.2%
132
Total
157
130
287
Let’s reframe this problem: Examine the difference
between two independent proportions, that is, pf–pm.
Is it zero? How about a 95% confidence interval?
Our new confidence interval formula
Here, “estimate”
means p-hat
(estimate – ME to estimate + ME)
ME = (multiplier)*(standard error)
The sampling distribution of
As long as both p-hat1 and p-hat2 are
approximately normal…
...and the two samples are independent...
Then the sampling distribution is
approximately normal with mean p1–p2 and
standard deviation
Standard error of the difference between two
independent statistics (p. 335 of book)
If you remember your geometry, it might help to
associate the S.E. of the difference with the
hypotenuse of a right triangle.
The good ol’ Pythagorean theorem says
Recall this example:
Are women more likely to have dogs?
Female
Male
Total
Has Dog
89
56.7%
66
50.8%
155
In this dataset,
No Dog
68
43.3%
64
49.2%
132
Total
157
130
287
Our new confidence interval formula
(estimate – ME to estimate + ME)
ME = (multiplier)*(S.E.)
In this dataset,
Therefore,
Thus, the 95% CI is
(0.059–0.118 to 0.059+0.118) or (–0.059, 0.177).
Recall this example:
Are women more likely to have dogs?
Female
Male
Total
Has Dog
89
56.7%
66
50.8%
155
No Dog
68
43.3%
64
49.2%
132
Total
157
130
287
The 95% CI for pf–pm is (–0.059, 0.177).
Importantly, this CI contains zero. So zero (no
difference) is a reasonable value!
General guidelines for using CIs to make
decisions
• Any value not in the interval can be rejected as a likely
value of the parameter.
• Special case: For an interval for a difference, if zero is
not in the interval then we can conclude a difference
between the parameters exists.
• …and finally: If you have two different Cis (on the
same scale) that do not overlap, it is safe to assume
there’s a significant difference. But the reverse is not
true!
If you understand today’s lecture…
9.49, 9.54, 10.50, 10.55, 10.57, 10.64, 10.67
Objectives:
• Apply general confidence interval formula: Estimate
plus/minus (multiplier × standard error)
• Calculate new values of the multiplier for new confidence
levels other than 95%
• Interpret confidence level as a relative frequency
• Describe the sampling distribution of a difference of two
independent sample proportions.
• Apply formula for the S.E. of a difference