Sampling Distribution Proportion
Download
Report
Transcript Sampling Distribution Proportion
Sampling Distribution
of a Sample Proportion
Lecture 26
Sections 8.1 – 8.2
Mon, Nov 1, 2004
Parameters and Statistics
The purpose of a statistic is to estimate a
population parameter.
A sample mean is used to estimate the population
mean.
A sample proportion is used to estimate the
population proportion.
Example
Example 8.1, p. 464.
The Census Bureau surveys 3000 employees and
asks them, “Have the job skills demanded by your
job increased over the past few years?”
57% replied, “Yes.”
That is a sample proportion.
What is the population proportion?
Some Questions
What if the survey were repeated?
Would the survey results again be 57%?
Would the sample proportion be close to 57%?
Might it be 99%?
Might it be 1%?
Some Questions
We hope that the sample proportion is close to
the population proportion.
How close can we expect it to be?
Would it be worth it to collect a larger sample?
If the sample were larger, would we expect the
sample proportion (probably) to be closer to the
population proportion?
How much closer?
The Sampling Distribution of a
Statistic
Sampling Distribution of a Statistic – The
distribution of values of the statistic over all
possible samples of size n from that population.
The Sample Proportion
Let p be the population proportion.
Then p is a fixed value (for a given population).
Let p^ (“p-hat”) be the sample proportion.
Then p^ is a random variable; it takes on a new
value every time a sample is collected.
The sampling distribution of p^ is the
probability distribution of all the possible values
of p^.
Example
Suppose that this class is 1/3 freshmen.
Suppose that we take a sample of 2 students,
selected with replacement.
Find the sampling distribution of p^.
Example
1/3
1/3
F
P(FF) = 1/9
N
P(FN) = 2/9
F
P(NF) = 2/9
N
P(NN) = 4/9
2/3
2/3
1/3
N
F
2/3
Example
Let X be the number of freshmen in the sample.
The probability distribution of X is
x
P(X = x)
0
4/9
1
4/9
2
1/9
Example
Let p^ be the proportion of freshmen in the
sample.
The sampling distribution of p^ is
x
P(p^ = x)
0
4/9
1/2
4/9
1
1/9
Simulating Sampling with the TI83
Use the TI-83 to simulate sampling 2 people
(with replacement) from a population in which
1/3 are freshmen.
Use the function randBin(n, p).
n = sample size (n = 2).
p = proportion of freshmen (p = 1/3).
The function will report the number of
freshmen in the sample.
Example
Now do it for a sample of size n = 30.
Use a seed of 63.
We find that randBin(30, 1/3) = 9.
This represents a sample proportion of 9 out of
30, or 9/30 = 0.30.
If we press ENTER several more times, we get
11, 9, 14, 6, and 16.
These represent sample proportions of 11/30,
9/30, 14/30, 6/30, and 16/30.
Example
The expression
randBin(n, p, k)
will compute randBin(n, p) k times and put the
results in a list.
With a seed of 94, randBin(30, 1/3, 100)
produces the list
{11, 14, 8, 10, 10, 5, 13, 9, 9, …}.
Example
If we divide each value by 30, we get the sample
proportions
{11/30, 14/30, 8/30, 10/30, 10/30, …}.
The Histogram
15
10
5
0.1
0.2
0.3
0.4
0.5
0.6
p^
Larger Sample Size
Now we will select samples of size 120 instead
of size 30.
Set the seed to 216.
randBin(120, 1/3, 100) produces
{44, 33, 43, 41, 38, 44, 46, 43, …}
The sample proportions are
{44/120, 33/120, 43/102, 41/120, 38/120, …}
The Histogram
25
20
15
10
5
0.1
0.2
0.3
0.4
0.5
0.6
p^
Observations and Conclusions
Observation #1: The values of p^ are clustered
around p.
Conclusion #1: p^ is probably close to p.
Observations and Conclusions
Observation #2: As the sample size increases,
the clustering is tighter.
Conclusion #2a: Larger samples give more
reliable estimates.
Conclusion #2b: For large sample sizes, we can
make very good estimates of the value of p.
More Observations and Conclusions
Observation #3: The distribution of p^ appears
to be approximately normal.
The Histogram
15
10
5
0.1
0.2
0.3
0.4
0.5
0.6
p^
The Histogram
15
10
5
0.1
0.2
0.3
0.4
0.5
0.6
p^
One More Conclusion
Conclusion #3: We can use the normal
distribution to calculate just how close to p we
can expect p^ to be.
However, we must know and for the
distribution of p^.
The Sampling Distribution of p^
It turns out that the sampling distribution of p^
is approximately normal with the following
parameters.
Mean of pˆ p
p1 p
Variance of pˆ
n
Standard deviation of pˆ
p1 p
n
The Sampling Distribution of p^
The approximation to the normal distribution is
excellent if
np 5 and n1 p 5.
Example
Suppose 51% of the population plan to vote for
candidate X, i.e., p = 0.51.
What is the probability that an exit survey of
1000 people would show candidate X with less
than 45% support, i.e., p^ .45?
Example
First, describe the sampling distribution of p^ if
the sample size is n = 1000.
p^ is approximately normal.
Check: np = 510 5 and n(1 – p) = 490 5.
p^ = 0.51.
p^ = ((.51)(.49)/1000) = 0.01581.
Example
The z-score of 0.45 is z = (0.45 – 0.51)/.01581
= -3.795.
P(p^ 0.45) = P(Z -3.795)
= 0.00007385 (not likely!)
That is why surveys work (within the margin of
error) and that is why people are saying that the
exit polls failed yesterday.
We have computed the p-value of 0.45 under the
null hypothesis that p = 0.51!
Let’s Do It!
Let’s do it! 8.5, p. 484 – Probabilities about the
Proportion of People with Type B Blood.
Let’s do it! 8.6, p. 485 – Estimating the
Proportion of Patients with Side Effects.
Let’s do it! 8.7, p. 487 – Testing hypotheses
about Smoking Habits.
See Example 8.5, p. 486 – Testing Hypotheses about
the Proportion of Cracked Bottles.