The sampling Distribution
Download
Report
Transcript The sampling Distribution
The Sampling Distribution of a
Statistic
•
•
•
•
•
•
Recall that a statistic is simply a number which
we somehow attach to a sample of some
population. Here are examples of simple minded
statistics:
The largest number in the sample.
The smallest number in the sample.
The range of the sample.
The midpoint of the range.
The median of the sample.
The average of the sample.
Which statistic do we use? Obviously …
… that depends on
which parameter of the population we want to estimate!
(duh !) For instance we could use:
• The largest number in the sample to guess
the maximum of the population.
• The smallest to guess
the minimum of the population..
• The range of the sample to guess
the spread of the population.
• The range/8 to guess
the standard deviation of the population
• The average of the sample to guess
the mean of the population.
Let’s do an example. Our population consists of:
1,000 beanbags, some weighing
• zero ounces (filled with air) some weighing
• two ounces (filled with peas) and some weighing
• seven ounces (filled with whatever.)
So we have a population that consists of 1000
numbers, some 0’s, some 2’s and some 7’s.
We would like to guess the mean of the population
and maybe the standard deviation,
but we have enough resources to sample only
three members of the population.
(Beanbags aren’t cheap, the bag is made of
gold)
Let’s start by listing all the possible samples of
three entries we can get:
We list each with the resulting sample average:
• So, if our sample is 2, 7, 7 we would guess
5.33, but if it is 2, 0, 7 we would guess 3.
You can see that our guess can be anyone of
these numbers:
These numbers are just the values of a random
variable (they vary at random!), and if we knew
the probability distribution we could make some
progress.
Progress starts with naming things (yourself, this
land I claim in the name of …., the Fighting Irish,
etc.), so let’s name a few things.
• The Random Variable above is called the
sample mean.
• The probability distribution of the sample
mean is called the
sampling distribution of the mean.
Let’s return to our example. The values of
the RV “sample mean” are:
So we need to fill the blanks in the following
table:
Some blanks we can fill, sort of:
where
Fk = (# k’s)/1000 = p(k), k = 0, 2, 7
But unless we know F0 , F2 , F7 we are
apparently stuck!
Not quite.
Basic Assumption about
Sampling Procedure
When we filled the three blanks in the
previous slide we tacitly assumed that
• Probabilities stay the same in each pick
and
• Probabilities multiply
In the next slide we rephrase the two
statements above as follows:
Basic Assumption about
Sampling Procedure (cont’d)
•
Definition. A sample of size N
consists of N entries picked from the
population of interest in such a way that
1. Each pick is independent of all the
others.
2. Each pick comes from the same
population of interest.
(We usually assume that random
sampling
achieves 1 above, 2 requires care in
defining the procedure.)
Let’s invent some numbers for F0 , F2 ,
and F7 .
F0 = 0.1
F2 = 0.6
F7 = 0.3
In the next slide we show, for each of
the 27 possible 3-samples,
the average and the probability of
fishing that particular sample from our
population.
• Note how our two basic assumptions that
Probabilities stay the same
and
Probabilities Multiply
allow us to compute the probabilities of each
sample. Now we can fill the probability
distribution table for the sample mean,
we just add probabilities
The Wonderful Secret
Regardless of how large or small n is, the
expected value of the sample mean
is exactly the mean of the population!
In other words
Now do the following exercise:
There are 16,000 undergraduates at Podunk
U., 5,000 freshmen, 4,000 sophomores, 4,000
juniors and 3,000 seniors.
1. Pick a random sample of size 2 from the
undergraduates of Podunk U. and average their
number of years of enrollment.
2. Display the Probability Distribution table of the
sample mean.
(There are 16 possible distinct samples.)
Verify the previous statement.