Transcript biostat 5

Sampling Distributions
A sampling distribution is created by, as the name
suggests, sampling from a population and then
calculating some statistic such as the sample
mean [X-Bar,] sample proportion [p-hat],
difference in means, difference in proportions,
and numerous other statistics.
We use these sampling distributions to assist us in
“estimating” population parameters such as the
population mean as well as testing hypotheses
such as testing the claim that the average fill
volume of coke cans is truly 12 fl.oz. [Ho: μ = 12]
Example
• A fair die is thrown an infinite number of
times,
• with the random variable X = # of spots on
any throw.
• The probability distribution of X is:
X
P(X)
1
1/6
2
1/6
3
1/6
4
1/6
5
1/6
• …and the mean and variance can be
calculated to be: μ = 3.5 and σ2 = 2.92
6
1/6
•
•
•
Sampling Distribution of Two Dice
A sampling distribution is created by looking at
all samples of size n=2 (i.e. two dice) and their means…
While there are 36 possible samples of size 2, there are only 11 values for
more frequently than others
, and some (e.g.
=3.5) occur
Sampling Distribution of Two Dice…
•
The sampling distribution of
is shown below:
6/36
5/36
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
)
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
4/36
P(
P( )
3/36
2/36
1/36
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
Compare distribution of X and sampling
distribution of
1
2
3
4
X
5
6
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
The relationship between population parameters
and parameters of the sampling distribution of the
sample mean is
Central Limit Theorem
• The sampling distribution of the mean of a random
sample drawn from any population is approximately
normal for a sufficiently large sample size.
• The larger the sample size, the more closely the
sampling distribution of X-bar will resemble a normal
distribution.
• If the population is normal, then X-bar is normally
distributed for all values of n.
• If the population is non-normal, then X-bar is
approximately normal only for larger values of n.
Sampling Distribution of Sample Mean
• If X is normal, X-Bar is normal. If X is
nonnormal, X-Bar is approximately normal for
sufficiently large sample sizes.
• Note: the definition of “sufficiently large”
depends on the extent of nonnormality of x (e.g.
heavily skewed; multimodal)
Example
• A quality engineer has observed that the amount
of soda in each “32-ounce” bottle of coke is
actually a normally distributed random variable,
with a mean of 32.2 ounces and a standard
deviation of .3 ounce. The “32-ounce” is what is
on the label of the bottle.
• If a customer buys one bottle, what is the
probability that the bottle will contain more than
32 ounces (the label)? This was covered in the
chapter on normal distributions.
Example
• We want to find P(X > 32), where X is normally distributed with
mean 32.2 and standard deviation 0.3
• “the probability that a single bottle contains more than 32 fl.oz
is approximately 0.75.”
• This is good because it means that 75% of your bottles actually
contain more than the label.
Example
• If you go to the store and buy a carton of four
bottles, you know that each individual bottle
should contain somewhere around 32.2 fl.oz.
and some may actually contain less that 32.2
fl.oz. (some may actually contain less than the
label of 32). You now wish to check to see if the
“mean” volume of coke in a 4-pack will be
greater than 32 ounces? In other words, you
want to know if your 4 bottles average at least
32 fl.oz.
• This requires that we know the sampling
distribution of the sample mean based on a
sample size of 4.
Example
•
= 32.2
• Z = (X – 32.2)/ 0.15 = (32 – 32.2)/0.15 = -1.33
Example
Problem
• The dean of the School of
Business claims that the
average salary of the
school’s graduates one
year after graduation is
$800 per week (μx) with a
standard deviation of $100
(σx). Note: This is the
population. A second-year
student would like to check
whether the claim about
the mean is correct. He
does a survey of 25 people
who graduated one year
ago and determines their
weekly salary. He
discovers the sample
mean to be $750. Is this
consistent with the dean’s
claim???
 x    800
 x   / n  100 / 25  20
Sample Proportions
• The estimator of a population proportion of
successes is the sample proportion. That is,
we count the number of successes in a sample
and compute:
• ( “p-hat”).
• X is the number of successes, n is the sample
size.
Sampling Distribution of Sample Proportion
• We can determine the mean, variance, and
standard deviation of .
• (The standard deviation of is called the
standard error of the proportion.)
Sampling Distribution of Sample Proportion
• Normal approximation to the binomial works
best when the number of experiments, n,
(sample size) is large, and the probability of
success, p, is close to 0.5, but it works fine if
• Two conditions should be met:
1) np ≥ 5
• 2) n(1–p) ≥ 5
• If these conditions are met, we can use the
normal distribution to work proportions problems
which means we will eventually use the Z-Score
Example
•
•
Assume the probability of an infection during an operation is 0.1(p) and you
observe the number of infections during the next 100 (n) operations.
Are the conditions satisfied to assume normality?
•
What is the sampling distribution of the sample proportion
•
What is the probability that you get more than 20 infections in the next 100
operations?
?
Other Common Sampling Distributions
• Sampling distribution of the difference
between two sample means.
• Sampling distribution of the difference
between two sample proportions.
Homework – Chapter Advise
• Don’t worry about
– “finite population issues”
– Sections 5.4, 5.6
• HW: 5.3.1, 5.3.3, 5.3.5, 5.5.1, 5.5.5
• Review questions and exercises HW:
– 1, 4, 7