Transcript File
Chapter 18
Sampling Distribution Models
VOCABULARY
Parameter – number that describes the
population. This value is not known
Statistic – number that can be
computed from the sample data without
making use of any unknown parameters
Sampling distribution – the distribution
of values taken by the statistic in all
possible samples of the same size from
the same population
VOCABULARY CONTINUED
Population proportion –
describes the proportion for the
entire population (p)
Sample proportion – the
proportion calculated for the
sample taken p̂
SAMPLING DISTRIBUTION OF A
SAMPLE PROPORTION
Choose an SRS of size n from a large population with
population proportion p having some characteristic of
interest. Let p̂ be the proportion of the sample having
that characteristic. Then:
The sampling distribution of p̂ is approx. normal
The mean of the sampling distribution is p
The standard deviation of the sampling distribution is
p 1 p
n
Assumptions and Conditions
The Normal model gets better as the sample
size gets bigger.
Most models are useful only when specific
assumptions are true.
There are two assumptions in the case of the
model for the distribution of sample
proportions:
1. The sampled values must be independent of
each other.
2. The sample size, n, must be large enough.
Assumptions and Conditions (cont.)
Assumptions are hard—often impossible—to
check. That’s why we assume them.
Still, we need to check whether the
assumptions are reasonable by checking
conditions that provide information about the
assumptions.
The corresponding conditions to check before
using the Normal to model the distribution of
sample proportions are the 10% Condition
and the Success/Failure Condition.
RULES
10% Condition: Use the recipe for the
standard deviation of p̂ only when the
population is at least 10 times as large as the
sample.
Success/Failure: We will use the normal
approximation to the sampling distribution of
p̂ for values of n and p that satisfy
n 1 p 10
np 10
EXAMPLE
You ask an SRS of 1500 first year college
students whether they applied to any other
college. There are over 1.7 million first year
college students. 35% of all first year students
applied to other colleges. What is the probability
that your sample will give a result within 2
percentage points of this true value?
P .33 pˆ .37
EXAMPLE CONTINUED
Step 1: Calculate the mean and standard
deviation
Step 2: Standardize the scores
EXAMPLE CONTINUED
Step 1: Calculate the mean and standard deviation
.35
.35 1 .35
1500
.0123
Step 2: Standardize the scores
.33 - .35
z
.0123
1.626
.37 .35
z
.0123
1.626
EXAMPLE CONTINUED
Step 3: Find the P 1.626 z 1.626
P z 1.626 P z 1.626
.9484 .0516
.8968
So almost 90% of all samples will give a
result within 2 percentage points of the true
value of the population
VOCABULARY
Parameters – the mean and
standard deviation of a population
and
Statistics – the mean and standard
deviation from the sample data x
and s
MEAN AND STANDARD
DEVIATION OF A SAMPLE MEAN
Suppose that x is the mean of an SRS of
size n drawn from a large population with
mean and standard deviation . Then
the mean of the sampling distribution of x is
and its standard deviation is .
n
The Fundamental Theorem of
Statistics
The sampling distribution of any mean becomes
Normal as the sample size grows.
All we need is for the observations to be independent
and collected with randomization.
We don’t even care about the shape of the population
distribution!
The Fundamental Theorem of Statistics is called the
Central Limit Theorem (CLT).
The CLT works better (and faster) the closer the
population model is to a Normal itself. It also works
better for larger samples.
http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/
The Fundamental Theorem of Statistics
(cont.)
The Central Limit Theorem (CLT)
The mean of a random sample has a
sampling distribution whose shape can be
approximated by a Normal model. The larger
the sample, the better the approximation will
be.
Assumptions and Conditions
The CLT requires remarkably few assumptions, so
there are few conditions to check:
1. Random Sampling Condition: The data values must be
sampled randomly or the concept of a sampling
distribution makes no sense.
2. Independence Assumption: The sample values must
be mutually independent. (When the sample is drawn
without replacement, check the 10% condition…)
3. Large Enough Sample Condition: There is no one-sizefits-all rule.
Standard Error (cont.)
When we don’t know p or σ, we’re stuck,
right?
Nope. We will use sample statistics to
estimate these population parameters.
Whenever we estimate the standard deviation
of a sampling distribution, we call it a
standard error.
Standard Error (cont.)
For a sample proportion, the standard error is
SE pˆ
ˆˆ
pq
n
For the sample mean, the standard error is
s
SE y
n
What Can Go Wrong?
Don’t confuse the sampling distribution with
the distribution of the sample.
When you take a sample, you look at the
distribution of the values, usually with a
histogram, and you may calculate summary
statistics.
The sampling distribution is an imaginary
collection of the values that a statistic might
have taken for all random samples—the one
you got and the ones you didn’t get.
What Can Go Wrong? (cont.)
Beware of observations that are not
independent.
The CLT depends crucially on the assumption
of independence.
You can’t check this with your data—you have
to think about how the data were gathered.
Watch out for small samples from skewed
populations.
The more skewed the distribution, the larger
the sample size we need for the CLT to work.