Chapter 18 sampling distribution models

Download Report

Transcript Chapter 18 sampling distribution models

Chapter 18
Sampling distribution models
math2200
Sample proportion
• Kerry vs. Bush in 2004
– A Gallup Poll
• 49% for Kerry
• 1016 respondents
– A Rasmussen Poll
• 45.9% for Kerry
• 1000 respondents
– Why the answers are different?
Model
• Let Y be the number of people favoring
Kerry in a sample of size n=1000
• Y ~ Binomial(n,p)
– p: the proportion of people for Kerry in the
entire population
• When n is large, Y can be approximated
by Normal model with mean np and
variance npq.
Modeling sample proportion
• The sample proportion

pq
– Normal model with mean p and variance
n

pq 
N  p,

n 

Kerry vs. Bush (cont’)
– Assume the true population proportion voting for
Kerry is 49%.
ˆ = Y/n has a normal model
– The sample proportion p
with mean 0.49 and standard deviation 0.0158
(n=1000)
– Then we know that both 49% and 45.9 % are
reasonable to appear
(0.459 - 0.49)/0.0158= - 1.962
Sampling Distribution Model
• Consider the sample proportion as a random
variable instead of a number. The distribution
of the sample proportion is called the
sampling distribution model for the
proportion.
Central limit theorem (CLT)
• If the observations are drawn
– independently
– from the same population (equivalently,
distribution)
the sampling distribution of the sample
mean becomes normal as the sample size
increases.
• The population distribution could be
unknown.
CLT
• Suppose the population distribution has mean μand
standard deviation σ
• The sample mean has mean μand standard
deviation  .
n
• Let Y1, …, Yn be n independently and identically
distributed random variables
– E(Y1) = μ
– Var(Y1)= σ2
• Then as n increases, the distribution of (Y1+…+Yn)/n
tends to a normal model with mean μand standard
deviation 
n
Standard Error
• If we don’t know  or σ, the population
parameters, we will use sample statistics to
estimate.
• The estimated standard deviation of a
sampling distribution is called a standard error.
Standard Error (cont.)
• For a sample proportion, the standard
error is
SE( pˆ ) 
pˆ qˆ
n
• For the sample mean, the standard error is
s
SE  y  
n
The Process Going Into the
Sampling Distribution Model
What Can Go Wrong?
• Don’t confuse the sampling distribution
with the distribution of the sample.
– When you take a sample, you look at the
distribution of the values, usually with a
histogram, and you may calculate summary
statistics.
– The sampling distribution is an imaginary
collection of the values that a statistic might
have taken for all random samples—the one
you got and the ones you didn’t get.
What Can Go Wrong? (cont.)
• Beware of observations that are not
independent.
– The CLT depends crucially on the assumption
of independence.
– You can’t check this with your data—you have
to think about how the data were gathered.
• Watch out for small samples from skewed
populations.
– The more skewed the distribution, the larger
the sample size we need for the CLT to work.
Summary
• Sample proportions or sample means are
statistics
– They are random because samples vary
– Their distribution can be approximated by normal
using the CLT
• Be aware of when the CLT can be used
– n is large
– If the population distribution is not symmetric, a
much larger n is needed
• The CLT is about the distribution of the sample
mean, not the sample itself