Sampling Distribution

Download Report

Transcript Sampling Distribution

Chapters 18 - 19
Sampling Distribution Models
and
Confidence Intervals
Example
• We want to find out the proportion of men in
U.S. population.
• We draw a sample and calculate the proportion
of men in the sample. Suppose we find that the
proportion p̂ = 60%.
• We conclude that 60% of the population are
men. How much should we trust this estimate?
• Suppose the actual percentage is p = 52%
Margin of Errors
• Example:
– 60% of U.S. population are men with margin of
errors ± 5%
– This yields an interval of estimate from 55% to 65%
 The extent of the interval on either side of the
sample proportion or mean is called the margin of
error (ME).
 In general, the intervals of estimate have the form
estimate ± ME.
Sampling Distribution
• Now imagine what would happen if we looked
at the sample proportions for these samples.
• The histogram we’d get if we could see all the
proportions from all possible samples is called
the sampling distribution of the proportions.
• What would the histogram of all the sample
proportions look like?
• It turns out that the histogram is unimodal,
symmetric, and centered at p. It’s a normal
distribution.
Sampling Distribution Model
• The mean of the sampling distribution is at p.
pq
• The standard deviation of the distribution is
n
• So, the distribution of the sample proportions
is modeled with normal model 
pq 
N  p,

n 
Assumptions
•
•
Most models are useful only when specific
assumptions are true.
There are two assumptions in the case of the
model for the distribution of sample
proportions:
1. The Independence Assumption: The sampled
values must be independent of each other.
2. The Sample Size Assumption: The sample size,
n, must be large enough.
Assumptions and Conditions
• Assumptions are hard—often impossible—to
check.
• Still, we need to check whether the assumptions
are reasonable by checking conditions that
provide information about the assumptions.
• The corresponding conditions to check before
using the Normal to model the distribution of
sample proportions are the Randomization
Condition, 10% Condition and the Success/Failure
Condition.
Conditions
1. Randomization Condition: The sample should
be a simple random sample of the population.
2. 10% Condition: If sampling has not been made
with replacement, then the sample size, n,
must be no larger than 10% of the population.
3. Success/Failure Condition: The sample size has
to be big enough so that both np and nq are at
least 10.
A Sampling Distribution Model
• A proportion is no longer just a computation
for a set of data.
– It is now a random quantity that has a distribution.
– This distribution is called the sampling distribution
model for proportions.
• Even though we depend on sampling
distribution models, we never actually get to
see them.
– We never actually take repeated samples from the
same population and make a histogram. We only
imagine or simulate them.
Slide 1- 9
The Central Limit Theorem
for a Proportion
Provided that the sampled values are independent
and the sample size is large enough, the sampling
distribution of p̂ is modeled by a Normal model
with
– Mean: p
– Standard deviation:
pq
n
Means – The “Average” of One Die
• Let’s start with a simulation of 10,000 tosses
of a die. A histogram of the results is:
Means – Averaging More Dice
• Looking at the average of
two dice after a simulation
of 10,000 tosses:
• The average of three dice
after a simulation of 10,000
tosses looks like:
Means – Averaging Still More Dice
• The average of 5 dice after a
simulation of 10,000 tosses
looks like:
• The average of 20 dice after
a simulation of 10,000
tosses looks like:
Means – What the Simulations Show
• As the sample size (number of dice) gets
larger, each sample average is more likely to
be closer to the population mean.
– So, we see the shape continuing to tighten around
3.5
• And, it probably does not shock you that the
sampling distribution of a mean becomes
Normal.
The Central Limit Theorem: The
Fundamental Theorem of Statistics
• The sampling distribution of any mean becomes
more nearly Normal as the sample size grows.
– All we need is for the observations to be
independent and collected with randomization.
– We don’t even care about the shape of the
population distribution!
• The Fundamental Theorem of Statistics is called
the Central Limit Theorem (CLT).
The Central Limit Theorem (CLT)
The mean of a random sample has a sampling
distribution whose shape can be
approximated by a Normal model. The larger
the sample, the better the approximation will
be.
Assumptions and Conditions
The CLT requires essentially the same assumptions
we saw for modeling proportions:


Independence Assumption: The sampled values must be
independent of each other.
Sample Size Assumption: The sample size must be
sufficiently large.
We can’t check these directly, but we can think about whether
the Independence Assumption is plausible, and check
– Randomization Condition
– 10% Condition: n is less than 10% of the population.
– Large Enough Sample Condition: The CLT doesn’t tell us how
large a sample we need. For now, you need to think about
your sample size in the context of what you know about the
population.
CLT and Standard Error
• The CLT says that the sampling distribution of any
mean or proportion is approximately Normal.
– For proportions, the sampling distribution is centered
at the population proportion.
– For means, it’s centered at the population mean.
– For proportions
pq
SD  p̂  
– For means
SD y  
n

n
•
•
•
•
Standard Error
But we don’t know p or σ, we’re stuck, right?
Since we don’t know p or σ, we can’t find the
true standard deviation of the sampling
distribution model, so we need to estimate the
S..D. of a sampling distribution. We call this
estimate (of the S.D. of a sampling distribution a
standard error (SE).
p̂q̂
For a sample proportion, SE  p̂  
n
s
For the sample mean, SE y  
n
The Real World & the Model World
Be careful! Now we have two distributions to deal
with.


The first is the real world distribution of the sample,
which we might display with a histogram.
The second is the math world sampling distribution
of the statistic, which we model with a Normal
model based on the Central Limit Theorem.
Don’t confuse the two!
A Confidence Interval
• By the 68-95-99.7% Rule, we know
about 68% of all samples will have p̂ ’s within 1 SE
of p
about 95% of all samples will have p̂ ’s within 2 SEs
of p
about 99.7% of all samples will have p̂ ’s within 3
SEs of p
A Confidence Interval
• Consider the 95% level:
There’s a 95% chance that p is no more than 2 SEs
away from p̂ .
So, if we reach out 2 SEs, we are 95% sure that p
will be in that interval. In other words, if we reach
out 2 SEs in either direction of p̂ , we can be 95%
confident that this interval contains the true
proportion.
• This is called a 95% confidence interval.
A 95 % Confidence Interval
What Does “95% Confidence” Mean?
• Each confidence interval uses a
sample statistic to estimate a
population parameter.
• But, since samples vary, the
statistics we use, and thus the
confidence intervals we
construct, vary as well.
• Our confidence is in the process
of constructing the interval, not
in any one interval itself.
• “95% confidence” means there
is 95% chance that our interval
will contain the true parameter.
M. E: Certainty vs. Precision
• We can claim, with 95% confidence, that the
interval p̂  2SE( p̂) contains the true population
proportion.
• The more confident we want to be, the larger
p̂
our ME needs to be (makes the interval wider).
M.E: Certainty vs. Precision
• To be more confident, we wind up being less precise.
• Because of this, every confidence interval is a balance
between certainty and precision.
• The tension between certainty and precision is always
there. Fortunately, in most cases we can be both
sufficiently certain and sufficiently precise to make useful
statements.
• The choice of confidence level is somewhat arbitrary, but
keep in mind this tension between certainty and
precision when selecting your confidence level.
• The most commonly chosen confidence levels are 90%,
95%, and 99% (but any percentage can be used).
Critical Values
• The ‘2’ in p̂  2SE( p̂) (our 95% confidence interval)
came from the 68-95-99.7% Rule.
• Using a table or technology, we find that a more
exact value for our 95% confidence interval is 1.96
instead of 2. We call 1.96 the critical value and
denote it z*.
• For any confidence level, we can find the
corresponding critical value.
• Example:
• For a 90% confidence interval,
the critical value is 1.645:
One-Proportion z-Interval
• When the conditions are met, we are ready to
find the confidence interval for the population
proportion, p.

• The confidence interval is p̂  z  SE  p̂ 
where
SE( p̂) 
p̂q̂
n
• The critical value, z*, depends on the particular
confidence level, C, that you specify.
What Can Go Wrong?
Margin of Error Too Large to Be Useful:
• We can’t be exact, but how precise do we
need to be?
• One way to make the margin of error smaller
is to reduce your level of confidence. (That
may not be a useful solution.)
• You need to think about your margin of error
when you design your study.
– To get a narrower interval without giving up
confidence, you need to have less variability.
– You can do this with a larger sample…
Homework Assignment
Chapter 18:
• Problem # 11, 15, 23, 29, 31, 47.
Chapter 19:
• Problem # 7, 9, 11, 13, 17, 27, 35, 37.