Sampling Distribution
Download
Report
Transcript Sampling Distribution
Chapters 18 - 19
Sampling Distribution Models
and
Confidence Intervals
Example
• We want to find out the proportion of men in
U.S. population.
• We draw a sample and calculate the proportion
of men in the sample. Suppose we find that the
proportion p̂ = 60%.
• We conclude that 60% of the population are
men. How much should we trust this estimate?
• Suppose the actual percentage is p = 52%
Margin of Errors
• Example:
– 60% of U.S. population are men with margin of
errors ± 5%
– This yields an interval of estimate from 55% to 65%
The extent of the interval on either side of the
sample proportion or mean is called the margin of
error (ME).
In general, the intervals of estimate have the form
estimate ± ME.
Sampling Distribution
• Now imagine what would happen if we looked
at the sample proportions for these samples.
• The histogram we’d get if we could see all the
proportions from all possible samples is called
the sampling distribution of the proportions.
• What would the histogram of all the sample
proportions look like?
• It turns out that the histogram is unimodal,
symmetric, and centered at p. It’s a normal
distribution.
Sampling Distribution Model
• The mean of the sampling distribution is at p.
pq
• The standard deviation of the distribution is
n
• So, the distribution of the sample proportions
is modeled with normal model
pq
N p,
n
Assumptions
•
•
Most models are useful only when specific
assumptions are true.
There are two assumptions in the case of the
model for the distribution of sample
proportions:
1. The Independence Assumption: The sampled
values must be independent of each other.
2. The Sample Size Assumption: The sample size,
n, must be large enough.
Assumptions and Conditions
• Assumptions are hard—often impossible—to
check.
• Still, we need to check whether the assumptions
are reasonable by checking conditions that
provide information about the assumptions.
• The corresponding conditions to check before
using the Normal to model the distribution of
sample proportions are the Randomization
Condition, 10% Condition and the Success/Failure
Condition.
Conditions
1. Randomization Condition: The sample should
be a simple random sample of the population.
2. 10% Condition: If sampling has not been made
with replacement, then the sample size, n,
must be no larger than 10% of the population.
3. Success/Failure Condition: The sample size has
to be big enough so that both np and nq are at
least 10.
A Sampling Distribution Model
• A proportion is no longer just a computation
for a set of data.
– It is now a random quantity that has a distribution.
– This distribution is called the sampling distribution
model for proportions.
• Even though we depend on sampling
distribution models, we never actually get to
see them.
– We never actually take repeated samples from the
same population and make a histogram. We only
imagine or simulate them.
Slide 1- 9
The Central Limit Theorem
for a Proportion
Provided that the sampled values are independent
and the sample size is large enough, the sampling
distribution of p̂ is modeled by a Normal model
with
– Mean: p
– Standard deviation:
pq
n
Means – The “Average” of One Die
• Let’s start with a simulation of 10,000 tosses
of a die. A histogram of the results is:
Means – Averaging More Dice
• Looking at the average of
two dice after a simulation
of 10,000 tosses:
• The average of three dice
after a simulation of 10,000
tosses looks like:
Means – Averaging Still More Dice
• The average of 5 dice after a
simulation of 10,000 tosses
looks like:
• The average of 20 dice after
a simulation of 10,000
tosses looks like:
Means – What the Simulations Show
• As the sample size (number of dice) gets
larger, each sample average is more likely to
be closer to the population mean.
– So, we see the shape continuing to tighten around
3.5
• And, it probably does not shock you that the
sampling distribution of a mean becomes
Normal.
The Central Limit Theorem: The
Fundamental Theorem of Statistics
• The sampling distribution of any mean becomes
more nearly Normal as the sample size grows.
– All we need is for the observations to be
independent and collected with randomization.
– We don’t even care about the shape of the
population distribution!
• The Fundamental Theorem of Statistics is called
the Central Limit Theorem (CLT).
The Central Limit Theorem (CLT)
The mean of a random sample has a sampling
distribution whose shape can be
approximated by a Normal model. The larger
the sample, the better the approximation will
be.
Assumptions and Conditions
The CLT requires essentially the same assumptions
we saw for modeling proportions:
Independence Assumption: The sampled values must be
independent of each other.
Sample Size Assumption: The sample size must be
sufficiently large.
We can’t check these directly, but we can think about whether
the Independence Assumption is plausible, and check
– Randomization Condition
– 10% Condition: n is less than 10% of the population.
– Large Enough Sample Condition: The CLT doesn’t tell us how
large a sample we need. For now, you need to think about
your sample size in the context of what you know about the
population.
CLT and Standard Error
• The CLT says that the sampling distribution of any
mean or proportion is approximately Normal.
– For proportions, the sampling distribution is centered
at the population proportion.
– For means, it’s centered at the population mean.
– For proportions
pq
SD p̂
– For means
SD y
n
n
•
•
•
•
Standard Error
But we don’t know p or σ, we’re stuck, right?
Since we don’t know p or σ, we can’t find the
true standard deviation of the sampling
distribution model, so we need to estimate the
S..D. of a sampling distribution. We call this
estimate (of the S.D. of a sampling distribution a
standard error (SE).
p̂q̂
For a sample proportion, SE p̂
n
s
For the sample mean, SE y
n
The Real World & the Model World
Be careful! Now we have two distributions to deal
with.
The first is the real world distribution of the sample,
which we might display with a histogram.
The second is the math world sampling distribution
of the statistic, which we model with a Normal
model based on the Central Limit Theorem.
Don’t confuse the two!
A Confidence Interval
• By the 68-95-99.7% Rule, we know
about 68% of all samples will have p̂ ’s within 1 SE
of p
about 95% of all samples will have p̂ ’s within 2 SEs
of p
about 99.7% of all samples will have p̂ ’s within 3
SEs of p
A Confidence Interval
• Consider the 95% level:
There’s a 95% chance that p is no more than 2 SEs
away from p̂ .
So, if we reach out 2 SEs, we are 95% sure that p
will be in that interval. In other words, if we reach
out 2 SEs in either direction of p̂ , we can be 95%
confident that this interval contains the true
proportion.
• This is called a 95% confidence interval.
A 95 % Confidence Interval
What Does “95% Confidence” Mean?
• Each confidence interval uses a
sample statistic to estimate a
population parameter.
• But, since samples vary, the
statistics we use, and thus the
confidence intervals we
construct, vary as well.
• Our confidence is in the process
of constructing the interval, not
in any one interval itself.
• “95% confidence” means there
is 95% chance that our interval
will contain the true parameter.
M. E: Certainty vs. Precision
• We can claim, with 95% confidence, that the
interval p̂ 2SE( p̂) contains the true population
proportion.
• The more confident we want to be, the larger
p̂
our ME needs to be (makes the interval wider).
M.E: Certainty vs. Precision
• To be more confident, we wind up being less precise.
• Because of this, every confidence interval is a balance
between certainty and precision.
• The tension between certainty and precision is always
there. Fortunately, in most cases we can be both
sufficiently certain and sufficiently precise to make useful
statements.
• The choice of confidence level is somewhat arbitrary, but
keep in mind this tension between certainty and
precision when selecting your confidence level.
• The most commonly chosen confidence levels are 90%,
95%, and 99% (but any percentage can be used).
Critical Values
• The ‘2’ in p̂ 2SE( p̂) (our 95% confidence interval)
came from the 68-95-99.7% Rule.
• Using a table or technology, we find that a more
exact value for our 95% confidence interval is 1.96
instead of 2. We call 1.96 the critical value and
denote it z*.
• For any confidence level, we can find the
corresponding critical value.
• Example:
• For a 90% confidence interval,
the critical value is 1.645:
One-Proportion z-Interval
• When the conditions are met, we are ready to
find the confidence interval for the population
proportion, p.
• The confidence interval is p̂ z SE p̂
where
SE( p̂)
p̂q̂
n
• The critical value, z*, depends on the particular
confidence level, C, that you specify.
What Can Go Wrong?
Margin of Error Too Large to Be Useful:
• We can’t be exact, but how precise do we
need to be?
• One way to make the margin of error smaller
is to reduce your level of confidence. (That
may not be a useful solution.)
• You need to think about your margin of error
when you design your study.
– To get a narrower interval without giving up
confidence, you need to have less variability.
– You can do this with a larger sample…
Homework Assignment
Chapter 18:
• Problem # 11, 15, 23, 29, 31, 47.
Chapter 19:
• Problem # 7, 9, 11, 13, 17, 27, 35, 37.