Chapter 18 – Part 1 Sampling Distribution Models for

Download Report

Transcript Chapter 18 – Part 1 Sampling Distribution Models for

Chapter 18 -- Part 1
Sampling Distribution Models for p̂
Sampling Distribution Models
Population Parameter?
Population
Inference
Sample Statistic
Sample
Objectives
Describe the sampling distribution of a
sample proportion
 Understand that the variability of a statistic
depends on the size of the sample


Statistics based on larger samples are less
variable
Review

Chapter 12 – Sample Surveys

Parameter (Population Characteristics)
 m (mean)
• p (proportion)

Statistic (Sample Characteristics)
• y (sample mean)
• p̂ (sample proportion)
Review

Chapter 12


“Statistics will be different for each sample. These
differences obey certain laws of probability (but only
for random samples).”
Chapter 14

Taking a sample from a population is a random
phenomena. That means:
• The outcome is unknown before the event occurs
• The long term behavior is predictable
Example





Who? Stat 101 students in Sections G and H.
What? Number of siblings.
When? Today.
Where? In class.
Why? To find out what proportion of students’
have exactly one sibling.
Example

Population


Stat 101 students in sections G and H.
Population Parameter

Proportion of all Stat 101 students in
sections G and H who have exactly one
sibling.
Example

Sample


4 randomly selected students.
Sample Statistic

The proportion of the 4 students who have
exactly one sibling.
Example

Sample 1
p̂ 

Sample 2
p̂ 

Sample 3
p̂ 
What Have We Learned
Different samples produce different
sample proportions.
 There is variation among sample
proportions.
 Can we model this variation?

Example

Senators Population Characteristics


p = proportion of Democratic Senators
Take SRS of size n = 10

Calculate Sample Characteristics
• p̂= sample proportion of Democratic Senators
Example
Sample
p̂
1
0.2
2
0.5
3
0.6
4
0.3
5
0.7
SRS characteristics
Values of and p̂ are random
 Change from sample to sample
 Different from population characteristics


p = 0.50
Imagine
Repeat taking SRS of size n = 10
 Collection of values for p̂ and ARE
DATA
 Summarize data – make a histogram



Shape, Center and Spread
Sampling distribution for p̂
p̂
Sampling Distribution for

Mean (Center)
m pˆ  p
We would expect on average to get p.
 Say p̂ is unbiased for p.

p̂
Sampling Distribution for

Standard deviation (Spread)
 pˆ 
p(1  p )

n
pq
n
As sample size n gets larger,  p̂ gets
smaller
 Larger samples are more accurate

Example





50% of people on
campus favor current
academic calendar.
1. Select n people.
2. Find sample
proportion of people
favoring current
academic calendar.
3. Repeat sampling.
4. What does sampling
distribution of sample
proportion look like?

n=2 m pˆ  0.50
 pˆ  0.35

n=5 m pˆ  0.50

n=10 m pˆ  0.50

n=25 m pˆ  0.50
 pˆ  0.22
 pˆ  0.16
 pˆ  0.10
Example





10% of all people are left
handed.
1. Select n people.
2. Find sample
proportion of left handed
people.
3. Repeat sampling.
4. What does sampling
distribution of sample
proportion look like?

n=2
m pˆ  0.10
 pˆ  0.214

n=10 m pˆ  0.10
 pˆ  0.096

n=50 m pˆ  0.10
 pˆ  0.043

n=100 m pˆ  0.10
 pˆ  0.030
p̂
Sampling Distribution for

Shape
• Normal Distribution

Two assumptions must hold in order for
us to be able to use the normal
distribution
• The sampled values must be independent of each
other
• The sample size, n, must be large enough
p̂
Sampling Distribution for

It is hard to check that these assumptions
hold, so we will settle for checking the
following conditions
• 10% Condition – the sample size, n, is less than
10% of the population size
• Success/Failure Condition – np > 10, n(1-p) > 10

These conditions seem to contradict one
another, but they don't!
p̂
Sampling Distribution for

Assuming the two conditions are true
(must be checked for each problem), then
the sampling distribution for p̂ is

pq 

N  p ,

n


p̂
Sampling Distribution for

But the sampling distribution has a center
(mean) of p (a population proportion)
often times we don’t know p.
p̂ be the center.
 Let

pˆ qˆ 

N  pˆ ,

n


Example


Senators
Check assumptions (p = 0.50)
1.
2.


10(0.50) = 5 and 10(0.50) = 5
n = 10 is 10% of the population size.
Assumption 1 does not hold.
Sampling Distribution of p̂ ????
Example #1

Public health statistics indicate that 26.4%
of the U.S. adult population smoked
cigarettes in 2002. Use the 68-95-99.7
Rule to describe the sampling distribution
for the sample proportion of smokers
among 50 adults.
Example #1

Check assumptions:
1.
1.

np = (50)(0.264) = 13.2 > 10
nq = (50)(0.736) = 36.8 > 10
n = 50, less than 10% of population
Therefore, the sampling distribution for
the proportion of smokers is
0.264,0.062
N
Example # 1
About 68% of samples have a sample
proportion between 20.2% and 32.6%
 About 95% of samples have a sample
proportion between 14% and 38.8%
 About 99.7% of samples have a sample
proportion between 7.8% and 45%

Example #2
Information on a packet of seeds claims
that the germination rate is 92%. What's
the probability that more than 95% of the
160 seeds in the packet will germinate?
 Check assumptions:
1. np = (160)(0.92) = 147.2 > 10
nq = (160)(0.08) = 12.8 > 10
2. n = 160, less than 10% of all seeds?

Review - Standardizing

You can standardize using the formula
z
pˆ  m pˆ
 pˆ
pˆ  p
z
pq
n
Review


Chapter 6 – The Normal Distribution
Y~ N(70,3)
71  70
)  P( Z  .33)  0.6293
3
68  70
P
(
Y

68
)

P
(
Z

)  1  P( Z  .67)  .7486
•
3
•
P(Y  71)  P( Z 
• P(68  Y  71)  P(Y  71)  P(Y  68)  .6293  .2514  .3779

Do you remember the 68-95-99.7 Rule?
Example #2

Therefore, the sampling distribution for the
proportion of seeds that will germinate is
N0.92,0.02
0.95  0.92 

P pˆ  0.95  P Z 

0.02 

 P Z  1.50
 P( Z  1.50)
 0.0668
Big Picture
Population Parameter?
Population
Inference
Sample Statistic
Sample
Big Picture

Before we would take one random sample and compute
our sample statistic. Presently we are focusing on:
number of outcomes
p̂ 
Total sample size


This is an estimate of the population parameter p.
But we realized that if we took a second random sample
that p̂ from sample 1 could possibly be different from
the p̂ we would get from sample 2. But p̂ from sample
2 is also an estimate of the population parameter p.
If we take a third sample then the p̂ for third sample
could possibly be different from the first and second p̂ ' s.
Etc.
Big Picture


So there is variability in the sample statistic p̂.
If we randomized correctly we can consider p̂
as random (like rolling a die) so even though
the variability is unavoidable it is
understandable and predictable!!! (This is the
absolutely amazing part).
Big Picture

So for a sufficiently large sample size (n)
we can model the variability in p̂ with a
normal model so:

pq 
p̂ ~ N  p,

n 

Big Picture


The hard part is trying to visualize what is going
on behind the scenes. The sampling distribution
of p̂ is what a histogram would look like if we
had every possible sample available to us.
(This is very abstract because we will never see
these other samples).
So lets just focus on two things:
Take Home Message

1. Check to see that
A. the sample size, n, is less than 10% of the
population size
 B. np > 10, n(1-p) > 10


2. If these hold then p̂ can be modeled
with a normal distribution that is:

pq 
p̂ ~ N  p,

n 

Example #3

When a truckload of apples arrives at a
packing plant, a random sample of 150 apples
is selected and examined for bruises,
discoloration, and other defects. The whole
truckload will be rejected if more than 5% of
the sample is unsatisfactory (i.e. damaged).
Suppose that actually 8% of the apples in the
truck do not meet the desired standard. What
is the probability of accepting the truck
anyway?
Example #3

What is the sampling distribution?
1. np = (150)(0.08) = 12>10
nq = (150)(0.92) = 138>10
2. n = 150 > 10% of all apples
So, the sampling distribution is N(0.08,0.022).
What is the probability of accepting the truck anyway?
0.05  0.08
)
0.022
 P( Z  1.36)
P( pˆ  0.05)  P( Z 
 0.0869