sampling distribution

Download Report

Transcript sampling distribution

Sampling Distributions
Chapter 18
Sampling Distributions
• If we could take every possible sample of
the same size (n) from a population, we
would create the sampling distribution.
• Try to picture taking infinite samples of
size n from a population.
Sampling Distributions
Why do we sample?
Averages are less variable and more
normal than individual observations
Sampling Distributions
• Categorical data produces
distributions that are based on
proportions.
X
Sampling Distributions
A parameter is a measure of the population.
This value is typically unknown. For
proportions this is p or π
A statistic is a measure from a sample. We
often use a statistic to estimate an unknown
parameter. For proportions this is p̂.
Sampling Distributions
• Sampling variability: in repeated random
sampling, the value of the statistic will
vary. (We don’t expect the same value
every time we sample do we!)
Sampling Distributions
• We expect variability when we sample.
Because of this we create a set of values
that fall around the center of the
distribution, p. This variability creates a
curve that under the right conditions will be
approximately normal.
Sampling Distributions
• When we create a distribution using
repeated samples from categorical data,
this is known as the sampling
distribution for a proportion.
Sampling Distributions
• When we describe a distribution we want
to focus on Shape, Center, and Spread
(Remember CUSS & BS?)
Sampling Distributions of
Proportions
In order to use a normal model for sampling
distributions of proportions and the formulas that
follow, the following 3 conditions must be met.
Each condition has it’s own purpose. You should
know why you need each one.
Sampling Distributions of
Proportions
• Conditions:
1) Randomization. This helps insure that
your data was fairly collected and not
biased in some way. You need to state
that your data comes from an SRS, was
fairly collected, was randomly chosen
etc…
Sampling Distributions of
Proportions
• Conditions:
2) Independence. This protects our standard
deviation formula and keeps it accurate. We
must insure (we usually assume) that the
sampled values are independent of one another.
If we are sampling without replacement, then we
must state that our sample is no more than 10%
of our population.
Sampling Distributions of
Proportions
• Conditions:
3) Large enough sample. To insure that
the sample size is large enough to
approximate normal, we must expect at
least 10 successes and at least 10
failures.
np  10 and n(1 – p)  10
Sampling Distributions of
Proportions
Provided conditions are met, the sampling
distribution of a proportion will be normal
with mean p and standard deviation
p(1  p)
n
Or in notation N(p,
p(1  p) )
n
Sampling Distributions for
Means
• When the data is quantitative,
your distribution is based on
repeated averages from the
samples.
Sampling Distributions for
Means
A parameter is a measure of the population.
This value is typically unknown. For
means this is μ.
x
A statistic is a measure from a sample. We
often use a statistic to estimate an
unknown parameter. For means this is x.
Sampling Distributions of Sample
Means
When we create a distribution using every
possible sample of a given size from
quantitative data, this is known as the
sampling distribution for a mean.
Sampling Distributions of Sample
Means
• The shape of the sampling distribution
depends on the shape of the population
it is drawn from.
** If the population is normal, then the
distribution of the sample mean will be
normal (regardless of sample size).
Sampling Distributions of Sample
Means
• The shape of the sampling distribution
depends on the shape of the
population.
**For skewed or odd shaped distributions, if
the sample size is large enough, the
sampling distribution will be approximately
normal. So…how large is large enough?
Sampling Distributions of
Sample Means
The Central Limit Theorem (CLT)
CLT addresses two things in a distribution,
shape and spread.
As the sample size increases:
• The shape of the sampling distribution
becomes more normal
• The variability of the sampling distribution
decreases
Sampling Distributions of Sample
Means
• The Law of Large Numbers
Draw observations at random from any
population with given mean . As n
increases, the sample mean gets closer
and closer to the true population mean, .
x
CLT vs LLN!!
• CLT - focuses on shape and spread
• Law of Large Numbers – focuses on
center
Sampling Distributions of Sample
Means
• Conditions:
1) Randomization.
2) Independence.
(Same as the first two conditions for Proportions)
Sampling Distributions of Sample
Means
• Conditions:
3) Large Enough Sample. There is no “for
sure” way to tell if your sample is large
enough. It is common practice that if your
sample is at least 30 (n ≥ 30), you are OK
to assume normal for the sampling
distribution.
(Remember, if the distribution is given
normal, then any sample size is OK)
Sampling Distributions of Sample
Means
• When conditions are met, and the data is
quantitative, the sampling distribution is
normal with a center at the population
mean, μ, and a standard deviation
at  X
So….

n

N( X ,
)
X
n
Sampling Distributions
• We said at the beginning that in most real
life cases, we will not know the population
parameters (µ, σ, p or π) so we will have
to use the sample statistics as estimates
of those. Our terminology changes just a
little…
Sampling Distributions
Sampling Distributions
Sampling Distributions
Adjusting Sample Size
• Questions about sample size often come up. If
we want to reduce variability one thing we can
do is increase sample size. Sometimes we must
figure out how much standard deviation we can
have, then determine what sample size will get
us there. We can use the formulas for standard
deviation and solve for sample size.
Adjusting Sample Size
• Shortcut!!!
Since the standard deviation decreases at a
rate of √n, taking a sample 4 times as large
reduces the standard deviation by ½.