Transcript 5-4-5-5

Suppose we are interested in the digits in people’s phone numbers.
There is some population mean (μ) and standard deviation (σ)
Now suppose we take a sample of 4 digits.
For that sample, we can find the sample mean (x ) and standard
deviation (s).
Now, suppose we look lots of samples, all of size 4 (4 digits per
sample). In that case, we have a set of sample means that we’re
looking at. How these sample means behave is called the
sampling distribution of the mean – that’s the probability
distribution of sample means
If we took the mean of those sample means, what
would you expect it to be?
While the sample mean varies from sample to sample
(this is called sampling variability), the sampling
means target the population mean. In other words, the
sample means are a good approximate to the
population mean. The mean of all possible sample
means would be the population mean.
(see table in book, pg 251)
What other statistics (characteristics of a
sample) target the population parameters?
These are called unbiased estimators
Mean
Variance
Proportions
What statistics do not target the population
parameters?
These are called biased estimators
Median
Range
Standard Deviation*
*For large samples, the bias for standard deviation is pretty
small, and so often s is used to approximate sigma
anyways.
Thinking back to our experiment of sampling 4
digits from phone numbers,
How is the population data distributed?
Is each possible value of a sample mean equally
likely? (Is a mean of 0 just as likely as a mean of
4?)
How spread out are those sample means?
If we increased the sample size (maybe from 4 to 8),
would you expect the sample means to be more or
less spread out?
The Big Point
Amazing result:
The sampling distribution of the mean is
approximately normal, even though the
original data had a uniform (not normal)
distribution.
The Central Limit Theorem
Assuming:
1. Our data (random variable x) has a
distribution with mean μ and standard
deviation σ. The distribution does not
have to be normal
2. Simple random samples, all of the same
size n, are selected from the population
The Central Limit Theorem
Then:
1. The distribution of the sample means x
will approach a normal distribution. The
distribution will become more normal as
sample size increases.
2. The mean of all the sample means is the
population mean (μ)
3. The standard deviation of all sample
means is 
n
This tells us that the sample means have a
normal distribution, and the mean of that
distribution is μ , and the standard deviation
of that distribution is 
n
Notation
The mean of the sample means:
x  
The standard deviation of the sample means:
x 

n
Notice this is showing us that the sample means are less spread
out then the original data, and that the larger the sample size, the
less spread out the sample means will be.
Practical Rules
If the original data is not normally distributed,
you need a sample size of at least 30 to have
the normal distribution be a good
approximation to the distribution of the
sample means.
If the original data is normally distributed, the
distribution of the sample means will be
normal for any sample size
Using the Central Limit Theorem
When you’re working with one value from a
normally distributed population, use what
x
we already learned:
z

When you’re working with a sample mean, be
sure to use the mean and standard deviation
of the sampling distribution.
z
x  x
x

x

n
Example
Replacement times for CD players are normally distributed
with a mean of 7.1 years and a standard deviation of 1.4
years.
What is the probability that a single CD player will need
replacing in under 6 years?
10 CD players are chosen at random. What is the probability
that the mean replacement time for the 10 players is under
6 years?
Before we calculate: Which probability do you expect to be
smaller? Why?
Example
For a single CD player:
P(x < 6)
6  7.1
z
 0.79
1.4
P(z < -0.79) = .2148
So, there is a 21.48% chance that the single CD
player will die in under 6 years.
Example
For the sample of 10 players:
P( x  6)
6  7.1
z
 2.48
1.4
10
P( z  2.48)  0.0066
So, there is a 0.66% chance that the mean
replacement time for a sample of 10 CD players
will be less than 6 years.
What conclusions can we draw?
Suppose that the mean of 7.1 and standard
deviation of 1.4 were for all brands of CD
players. Suppose you bought 10 CheapoBrand CD players, and the mean
replacement time for the 10 players was
under 6 years. What would that suggest?
Homework
5.4: 1, 3
5.5: 1, 3, 5, 7, 9, 11, 13, 19