The Central Limit Theorem

Download Report

Transcript The Central Limit Theorem

The Beauty of the Long Run
Do not worry, this is not a presentation on the
benefits of marathons! The “long run” referred to
in the title is not in terms of distance, but time,
repetitions or, more elegantly
iterations.
Let’s review the situation:
There is a given population with unknown
population mean , usually unknown population
standard deviation , and maybe some unknown
parameter p you would also like to know. The
population is huge, or maybe your testing is
destructive, or prohibitively expensive, so you
sample, that is …
… you pick at random n elements of the
population, get n numbers
x1, x2, x3, …, xn
apply some arithmetic to the numbers and get
a new number
(g for guess, “hat” for tradition!)
Note the following two facts:
1. The formula you use on the n numbers you
got can be as wild as you please (a free
country!)
2. Probabilities stay the same in each pick and
probabilities multiply
In usual practice the formula you use on the
n sample data is pretty common, it’s their
average
(x1 + x2 + … + xn)/n
The resulting number we get is called the
sample mean
and instead of denoting it as
(too easy!)
we denote it as
x
We have already learned that the sample
mean is a RV (it varies at random!) and
Additionally it is also true that
And therefore
The Beauty of the Long Run
Note that regardless of how wild the
population of interest may be, the statistic
x sampling distribution of the mean
has a certain fixed expected value
and standard deviation
Other than these two limitations, the actual
distribution of the sample mean could be
quite wild, especially if n is small (you were
cheap and did not sample enough entries
from the population!)
Well … not quite as wild. Here is a fantastic
theorem:
Central Limit Theorem (How
nature/god/allah/visnu/the great spirit etc.
work)
The Beauty of the Long Run
No matter how wild the population of interest
may be, in the long run (if n is large enough),
the distribution of the statistic
x sampling distribution of the mean
becomes normal, with mean
deviation
and standard
FINALLY !!
We can use the Normal Tables to guess
!
Unfortunately the validity of the guess
depends on how large n is, and there is not
that much information about that, even
though usually
n ≥ 30 gives good guesses.
(we will study next what “good guesses”
means.