Transcript document

The Guessing Game
or What’s an Educated Guess?
Do not worry, this is not a presentation on the benefits
of ESP! The “guess” referred to in the title is not a
wild stab, but a reasoned, reasonable estimate of
what some unknown number could be, including
(eventually) a reasoned, reasonable estimate of the
chance of our estimating incorrectly.
Let’s review the situation:
There is a given population with unknown population
mean
, usually unknown population standard
deviation , and maybe some unknown parameter p
you would also like to know. The population is huge,
or maybe your testing is destructive, or prohibitively
expensive, so you sample, that is …
… you pick at random n elements of the
population, get n numbers
x1, x2, x3, …, xn
apply some arithmetic to the numbers and get
a new number (a statistic)
(g for guess, “hat” for tradition!)
Note the following two facts:
1. The formula you use on the n numbers you
got can be as wild as you please (a free
country!)
2. Probabilities stay the same in each pick and
probabilities multiply
Various Educated Guesses
Let me give you six methods you could use to
guess the population mean
:
1. The sample average
:
x
2. The Bidding Selection Method BS
Throw away the highest and the lowest
numbers, average the rest,
3. The cheapie method CM:
Always pick the second number you got
4. The “glass is half-empty” method gL
gL = the lowest number you got.
5. The “glass is half-full” method gH
gH = the highest number you got.
6. Granny’s method GM:
GM = the best looking number you got.
OK, let’s discard Granny’s method, it’s very
subjective!
But what about the other five, are there any
you would discard as bad guesses for the
population mean, and why?
Let’s list them:
x,BS, CM, gL, gH
So, are there any of these statistics (that is
what they are, numbers associated to
samples!) you would not use as reasonable
guesses for the population mean?
The trouble is …..
… we have not decided what
reasonable
means!
Maybe we’ll have better luck deciding what
unreasonable
means! (without night there would be no day,
concepts are often defined by their limitations or
opposites)
For example I submit to you that gL and gH are
unreasonable guesses for the population mean.
Why?
Let’s do an example. Our population consists of:
1,000 beanbags, some weighing
we get the following 27 possible samples,
each with its “half-empty glass” statistic and
corresponding probability
This new statistic has three values 0, 2, 7,
and its probability distribution is
(gotten by adding the probabilities in the
previous table.)
Remember the wonderful secret
No such luck this time, we get instead
E(gL) = 2x0.216 + 7x0.027 = 0.432 + 0.189
= 0.621 far below the actual value of
!
That’s right, the statistic gL underestimates,
because on the average it gives a value much
lower than what it is supposed to estimate.
Similarly (check it out!)
the statistic gH overestimates, because on the
average it gives a value much higher than what it
is supposed to estimate.
So what statistic Y are we looking for?
Obviously one that
neither underestimates nor overestimates,
that is one that on the average hits it on the nose.
This means that E(Y) = the parameter to be
estimated.
Any statistic Y that
neither underestimates
nor overestimates
the population parameter we are trying to
estimate
(that is E(Y) =
)
is called an UNBIASED point estimator
The word “point” here means we are
guessing one value (one point on the real
line) as opposed to guessing an interval as
we will do later.
MVUE’ s
Let me let you in on a little secret:
The three point estimators of
:
x , BS, CM
we met before are all three unbiased. So …
which is the best of the three?
or more precisely
how do we compare them?
(i.e., what does best mean?)
Quick Answer:
Best means the one (if it exists) that gives
us the highest probability of being correct.
Recall than variance measures how wildly
spread the values can be. The smaller the
variance the more tightly packed the values
are (around their expected value). Therefore
between two unbiased estimators we pick
the one with smaller variance. Between a
million unbiased estimators we pick the one
with the smallest variance (if it exists!) and
call it a M(inimum) V(ariance) U(nbiased) E(stimator)
The power of the mean
Fact: Among (the millions and millions
of) all possible unbiased estimators of
the population mean
, the sample
mean
is the MVUE.
(for once our intuition is correct !)
The weakness of the Variance
Recall that when we defined the sample
variance we added a “correction factor”
n/(n-1)
that is we wrote
The correction is due to the fact that without it
the sample variance would underestimate the
population variance.
The correction makes s2 unbiased, but raises its
own variance.
The choices are:
1. On the nose but a little wilder
2. Minimum wildness but slightly off
(No best guess this time, it does not exist!)
We will often need to guess the population
variance and, in the interest of correctness, we
will choose Option 1.
Actually, if n is large, either one is acceptable.