Transcript Document
Sampling distributions
BPS chapter 10
© 2006 W. H. Freeman and Company
For any population with mean m and standard deviation s:
The mean, or center of the sampling distribution of x bar, is equal to
the population mean m.
The standard deviation of the sampling distribution is s/√n, where n
is the sample size.
Sampling distribution of x bar
s/√n
m
The central limit theorem
Central Limit Theorem: When randomly sampling from any population
with mean m and standard deviation s, when n is large enough, the
sampling distribution of x bar is approximately normal: N(m,s/√n).
Population with
strongly skewed
distribution
Sampling
distribution of
x for n = 2
observations
Sampling
distribution of
x for n = 10
observations
Sampling
distribution of
x for n = 25
observations
Further properties
The Central Limit Theorem is valid as long as we are sampling many
small random events, even if the events have different distributions (as
long as no one random event has an overwhelming influence).
What should this mean to you?
It explains why so many variables are normally distributed.
Example: Height seems to be determined by a large number of
genetic and environmental factors, like nutrition.
So height is very much like our sample mean x.
The “individuals” are genes and environmental
factors. Your height is a mean.
Now we have a better idea of why the density
curve for height has this shape.
How large a sample size?
It depends on the population distribution. More observations are
required if the population distribution is far from normal.
A sample size of 25 is generally enough to obtain a normal sampling
distribution from a strong skewness or even mild outliers.
A sample size of 40 will typically be good enough to overcome extreme
skewness and outliers.
Income distribution
Let’s consider the very large database of individual incomes from the Bureau of
Labor Statistics as our population. It is strongly right skewed.
We take 1000 SRSs of 100 incomes, calculate the sample mean for
each, and make a histogram of these 1000 means.
We also take 1000 SRSs of 25 incomes, calculate the sample mean for
each, and make a histogram of these 1000 means.
Which histogram
corresponds to the
samples of size
100? 25?
Confidence intervals:
The basics
BPS chapter 13
© 2006 W.H. Freeman and Company
Uncertainty and confidence
Although the sample mean, x, is a unique number for any particular
sample, if you pick a different sample, you will probably get a different
sample mean.
In fact, you could get many different values for the sample mean, and
virtually none of them would actually equal the true population mean, m.
But the sample distribution is narrower than the population distribution,
by a factor of √n.
n
Sample means,
n subjects
Thus, the estimates
x
x
s
gained from our samples
are always relatively
n
Population, x
individual subjects
close to
the population
s
parameter µ.
m
If the population is normally distributed N(µ,σ),
so will the sampling distribution N(µ,σ/√n).
Ninety-five percent of all
s
n
sample means will be within
roughly 2 standard deviations
(2*s/√n) of the population
parameter m.
Because distances are
symmetrical, this implies that
the population parameter m
must be within roughly 2
standard deviations from
the sample average x, in
95% of all samples.
This reasoning is the essence of statistical inference.
Red dot: mean value
of individual sample
The weight of single eggs of the brown variety is normally distributed N(65g,5g).
Think of a carton of 12 brown eggs as an SRS of size 12.
What is the distribution of the sample means x ?
Normal (mean m, standard deviation s/√n) = N(65g,1.44g).
Find the middle 95% of the samplemeans distribution.
Roughly ± 2 standard deviations from the mean, or 65g ± 2.88g.
population
sample
You buy a carton of 12 white eggs instead. The box weighs 770g.
The average egg weight from that SRS is thus x = 64.2g.
Knowing that the standard deviation of egg weight is 5g, what
can you infer about the mean µ of the white egg population?
There is a 95% chance that the population mean µ is roughly within
± 2s/√n of x , or 64.2g ± 2.88g.