Transcript File

Chapter 18
Sampling Distribution Models
VOCABULARY
 Parameter – number that describes the
population. This value is not known
 Statistic – number that can be
computed from the sample data without
making use of any unknown parameters
 Sampling distribution – the distribution
of values taken by the statistic in all
possible samples of the same size from
the same population
VOCABULARY CONTINUED
Population proportion –
describes the proportion for the
entire population (p)
Sample proportion – the
proportion calculated for the
sample taken  p̂ 
SAMPLING DISTRIBUTION OF A
SAMPLE PROPORTION
Choose an SRS of size n from a large population with
population proportion p having some characteristic of
interest. Let p̂ be the proportion of the sample having
that characteristic. Then:
 The sampling distribution of p̂ is approx. normal
 The mean of the sampling distribution is p
 The standard deviation of the sampling distribution is

p 1  p 
n
Assumptions and Conditions
 The Normal model gets better as the sample
size gets bigger.
 Most models are useful only when specific
assumptions are true.
 There are two assumptions in the case of the
model for the distribution of sample
proportions:
1. The sampled values must be independent of
each other.
2. The sample size, n, must be large enough.
Assumptions and Conditions (cont.)
 Assumptions are hard—often impossible—to
check. That’s why we assume them.
 Still, we need to check whether the
assumptions are reasonable by checking
conditions that provide information about the
assumptions.
 The corresponding conditions to check before
using the Normal to model the distribution of
sample proportions are the 10% Condition
and the Success/Failure Condition.
RULES
10% Condition: Use the recipe for the
standard deviation of p̂ only when the
population is at least 10 times as large as the
sample.
Success/Failure: We will use the normal
approximation to the sampling distribution of
p̂ for values of n and p that satisfy
n 1  p   10
np  10
EXAMPLE
You ask an SRS of 1500 first year college
students whether they applied to any other
college. There are over 1.7 million first year
college students. 35% of all first year students
applied to other colleges. What is the probability
that your sample will give a result within 2
percentage points of this true value?
P .33  pˆ  .37 
EXAMPLE CONTINUED
Step 1: Calculate the mean and standard
deviation
Step 2: Standardize the scores
EXAMPLE CONTINUED
Step 1: Calculate the mean and standard deviation
  .35
.35 1  .35 

1500
 .0123
Step 2: Standardize the scores
.33 - .35
z
.0123
 1.626
.37  .35
z
.0123
 1.626
EXAMPLE CONTINUED
Step 3: Find the P  1.626  z  1.626 
 P  z  1.626   P  z  1.626 
 .9484  .0516
 .8968
So almost 90% of all samples will give a
result within 2 percentage points of the true
value of the population
VOCABULARY
 Parameters – the mean and
standard deviation of a population
  and  
 Statistics – the mean and standard
deviation from the sample data x
and s 

MEAN AND STANDARD
DEVIATION OF A SAMPLE MEAN
Suppose that x is the mean of an SRS of
size n drawn from a large population with
mean  and standard deviation . Then
the mean of the sampling distribution of x is 
and its standard deviation is  .
n
The Fundamental Theorem of
Statistics
 The sampling distribution of any mean becomes
Normal as the sample size grows.


All we need is for the observations to be independent
and collected with randomization.
We don’t even care about the shape of the population
distribution!
 The Fundamental Theorem of Statistics is called the
Central Limit Theorem (CLT).
 The CLT works better (and faster) the closer the
population model is to a Normal itself. It also works
better for larger samples.
http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/
The Fundamental Theorem of Statistics
(cont.)
The Central Limit Theorem (CLT)
The mean of a random sample has a
sampling distribution whose shape can be
approximated by a Normal model. The larger
the sample, the better the approximation will
be.
Assumptions and Conditions
 The CLT requires remarkably few assumptions, so
there are few conditions to check:
1. Random Sampling Condition: The data values must be
sampled randomly or the concept of a sampling
distribution makes no sense.
2. Independence Assumption: The sample values must
be mutually independent. (When the sample is drawn
without replacement, check the 10% condition…)
3. Large Enough Sample Condition: There is no one-sizefits-all rule.
Standard Error (cont.)
 When we don’t know p or σ, we’re stuck,
right?
 Nope. We will use sample statistics to
estimate these population parameters.
 Whenever we estimate the standard deviation
of a sampling distribution, we call it a
standard error.
Standard Error (cont.)
 For a sample proportion, the standard error is
SE  pˆ  
ˆˆ
pq
n
 For the sample mean, the standard error is
s
SE  y  
n
What Can Go Wrong?
 Don’t confuse the sampling distribution with
the distribution of the sample.


When you take a sample, you look at the
distribution of the values, usually with a
histogram, and you may calculate summary
statistics.
The sampling distribution is an imaginary
collection of the values that a statistic might
have taken for all random samples—the one
you got and the ones you didn’t get.
What Can Go Wrong? (cont.)
 Beware of observations that are not
independent.


The CLT depends crucially on the assumption
of independence.
You can’t check this with your data—you have
to think about how the data were gathered.
 Watch out for small samples from skewed
populations.

The more skewed the distribution, the larger
the sample size we need for the CLT to work.