Transcript Correlation

Reminder: What is a sampling
distribution?
•The sampling distribution of a statistic is
the distribution of all possible values of the
statistic when all possible samples of a fixed
size n are taken from the population. It is a
theoretical idea — we do not actually build
it.
•The sampling distribution of a statistic is
the probability distribution of that statistic.
Sampling distribution of x bar
•We take many random samples of a given size n from a
population with mean m and standard deviation s.
•Some sample means will be above the population mean m
and some will be below, making up the sampling
distribution.
Sampling
distribution
of “x bar”
Histogram
of some
sample
averages
For any population with mean m and standard deviation s:
• The mean, or center of the sampling distribution of x bar,
is equal to the population mean m : m X  m
• The standard deviation of the sampling distribution of x
bar is s X  s / n where n is the sample size.
Sampling distribution of x bar

s/√n
m
For normally distributed populations
When a random variable is normally distributed, the
sampling distribution of x bar for all possible samples of
size n is also normally distributed.
Sampling distribution
If the population is N(m,s)
then the sample mean has a
N(m,s/√n) distribution.
Population
• The shape of Xbar tends to be normal. Even if
the population is not normal, if the size (n) of the
SRS is large enough and it is taken from from
any population with mean = m and standard
deviation = s,then:
– Xbar is approximately N(m,s/sqrt(n)).
• This fact is called the Central Limit Theorem
QuickTime™ and a
decompressor
are needed to see this picture.
Population
distribution
Dist. of X-bar
for n=10
Dist. of X-bar
for n=2
Dist. of X-bar
for n=25
Application
•Hypokalemia is diagnosed when blood potassium levels are low,
below 3.5mEq/dl. Let’s assume that we know a patient whose
measured potassium levels vary daily according to a normal distribution
N(m = 3.8, s = 0.2)
•If only one measurement is made, what is the probability that this
patient will be misdiagnosed hypokalemic?
z = −1.5, P(z < −1.5) = 0.0668 ≈ 7%
If instead measurements are taken on 4 separate days and
they are averaged, what is the probability of such a
misdiagnosis?
z = −3, P(z < −1.5) = 0.0013 ≈ 0.1%
Note: Be sure to standardize (z) using the standard deviation of the variable
being standardized (X in first case, X-bar in second case)!!
Income distribution
Let’s consider the very large database of individual incomes from the
Bureau of Labor Statistics as our population. It is strongly right skewed.
– We take 1000 SRSs of 100 incomes, calculate the sample mean
for each, and make a histogram of these 1000 means.
– We also take 1000 SRSs of 25 incomes, calculate the sample
mean for each, and make a histogram of these 1000 means.
Which histogram
corresponds to the
samples of size
100? 25?
How large a sample size is required to
achieve normality of X-bar?
•… depends on the population distribution. More
observations are required if the population distribution is far
from being normal.
– A sample size of 25 is generally enough to obtain a
normal sampling distribution for X-bar from a strong
skewness or even mild outliers.
– A sample size of 40 will typically be good enough to
overcome extreme skewness and outliers and make
Xbar look normal
In many cases, n = 25 isn’t a huge sample. Thus,
even for strange population distributions we can
assume a normal sampling distribution of the
sample mean and work with it to solve problems.
• HW: Read section 5.2 thru p. 242; don’t
worry too much about how the book
derives the formulas... instead make sure
you know the Central Limit Theorem and
what’s found in the boxes on p. 337, 338,
and 339 and in the Summary on page 346.
• Do problems # 5.36-5.42, 5.44, 5.47, 5.48,
5.51, 5.53, 5.55, 5.66, 5.70, 5.73