Sampling Distribution Proportion
Download
Report
Transcript Sampling Distribution Proportion
Sampling Distribution
of a Sample Proportion
Lecture 26
Sections 8.1 – 8.2
Fri, Oct 21, 2005
Parameters and Statistics
THE PURPOSE OF A STATISTIC IS TO
ESTIMATE A POPULATION PARAMETER.
A sample mean is used to estimate the population
mean.
A sample proportion is used to estimate the
population proportion.
Sample statistics, by their very nature, are
variable.
Population parameters are fixed.
Example
Example 8.1, p. 464.
The Census Bureau surveys 3000 employees and
asks them, “Have the job skills demanded by your
job increased over the past few years?”
57% replied, “Yes.”
That is a sample proportion.
What is the population proportion?
Some Questions
Do you think that the answer will always be the
same, namely 57%?
Would a sample proportion of 0 be possible?
Would a sample proportion of 1 be possible?
Do you think any of the resulting sample
proportions could be more than 0.75?
Do you know if this sample proportion of 0.57
is close to the true population proportion?
Why do you think conclusions were drawn from
this value?
Some Questions
We hope that the sample proportion is close to
the population proportion.
How close can we expect it to be?
Would it be worth it to collect a larger sample?
If the sample were larger, would we expect the
sample proportion to be closer to the population
proportion?
How much closer?
The Sampling Distribution of a
Statistic
Sampling Distribution of a Statistic – The
distribution of values of the statistic over all
possible samples of size n from that population.
The Sample Proportion
Let p be the population proportion.
Then p is a fixed value (for a given population).
Let p^ (“p-hat”) be the sample proportion.
Then p^ is a random variable; it takes on a new
value every time a sample is collected.
The sampling distribution of p^ is the
probability distribution of all the possible values
of p^.
Example
Suppose that this class is 3/4 freshmen.
Suppose that we take a sample of 2 students,
selected with replacement.
Find the sampling distribution of p^.
Example
3/4
3/4
F
P(FF) = 9/16
N
P(FN) = 3/16
F
P(NF) = 3/16
N
P(NN) = 1/16
1/4
1/4
3/4
N
F
1/4
Example
Let X be the number of freshmen in the sample.
The probability distribution of X is
x
P(X = x)
0
1/16
1
6/16
2
9/16
Example
Let p^ be the proportion of freshmen in the
sample. (p^ = X/n.)
The sampling distribution of p^ is
x
P(p^ = x)
0
1/16
1/2
6/16
1
9/16
Samples of Size n = 3
If we sample 3 people (with replacement) from
a population that is 3/4 freshmen, then the
proportion of freshmen in the sample has the
following distribution.
x
0
1/3
2/3
1
P(p^ = x)
1/64 = .02
9/64 = .14
27/64 = .42
27/64 = .42
Samples of Size n = 4
If we sample 4 people (with replacement) from
a population that is 3/4 freshmen, then the
proportion of freshmen in the sample has the
following distribution.
x
0
1/4
P(p^ = x)
1/256 = .004
12/256 = .05
2/4
3/4
1
54/256 = .21
108/256 = .42
81/256 = .32
Sampling Distributions
Run the program
Central Limit Theorem for Proportions.exe.
Use n = 30 and p = 0.75; generate 100 samples.
100 Samples of Size n = 30
= 0.75
= 0.079
Observations and Conclusions
Observation #1: The values of p^ are clustered
around p.
Conclusion #1: p^ is probably close to p.
Larger Sample Size
Now we will select 100 samples of size 120
instead of size 30.
Run the program
Central Limit Theorem for Proportions.exe.
Pay attention to the spread (standard deviation)
of the distribution.
100 Samples of Size n = 120
= 0.75
= 0.0395
Observations and Conclusions
Observation #2: As the sample size increases,
the clustering is tighter.
Conclusion #2A: Larger samples give more
reliable estimates.
Conclusion #2B: For sample sizes that are large
enough, we can make very good estimates of
the value of p.
Larger Sample Size
Now we will select 10000 samples of size 120
instead of only 100 samples.
Run the program
Central Limit Theorem for Proportions.exe.
Pay attention to the shape of the distribution.
10,000 Samples of Size n = 120
= 0.75
= 0.0395
10,000 Samples of Size n = 126
More Observations and Conclusions
Observation #3: The distribution of p^ appears
to be approximately normal.
One More Conclusion
Conclusion #3: We can use the normal
distribution to calculate just how close to p we
can expect p^ to be.
However, we must know the values of and
for the distribution of p^.
That is, we have to quantify the sampling
distribution of p^.
The Sampling Distribution of
It turns out that the sampling distribution of p^
is approximately normal with the following
parameters.
Mean of pˆ p
Variance of pˆ
p1 p
n
Standard deviation of pˆ
^
p
p1 p
n
This is the Central Limit Theorem for
Proportions, summarized on page 519.
The Sampling Distribution of
^
p
The approximation to the normal distribution is
excellent if
np 5 and n1 p 5.
Example
Suppose 51% of the population plan to vote for
candidate X, i.e., p = 0.51.
What is the probability that an exit survey of
1000 people would show candidate X with less
than 45% support, i.e., p^ < .45?
Example
First, describe the sampling distribution of p^ if
the sample size is n = 1000 and p = 0.51.
Check: np = 510 5 and n(1 – p) = 490 5.
p^ is approximately normal.
p^ = 0.51.
p^ = ((.51)(.49)/1000) = 0.01581.
Example
The z-score of 0.45 is z = (0.45 – 0.51)/.01581
= -3.795.
P(p^ < 0.45) = P(Z < -3.795)
= 0.00007385 (not likely!)
That is why surveys work (within the margin of
error).
Let’s Do It!
Let’s do it! 8.5, p. 521 – Probabilities about the
Proportion of People with Type B Blood.
Let’s do it! 8.6, p. 523 – Estimating the
Proportion of Patients with Side Effects.
Let’s do it! 8.7, p. 525 – Testing hypotheses
about Smoking Habits.