Transcript 9.1

9-1:Sampling Distributions 
Preparing for Inference!
Parameter: A number that describes
the population (usually not known)
 Statistic: A number that can be
computed from the sample data
without making use of any unknown
parameters.

Example

Sample surveys show that fewer people
enjoy shopping than in the past. A recent
survey asked a nationwide random
sample of 2500 adults if they agreed or
disagreed that “I like buying new clothes
but shopping is often frustrating and
time-consuming.” Of the respondents,
1650, or 66%, said they agreed.
Estimate the proportion of the population that
find clothes shopping frustrating
Call this population = p
 The poll found that 1650 out of 2500
randomly selected adults agreed
with the statement that shopping is
frustrating.
 The proportion of the sample who
agreed was….

Sampling Variability
Sampling Variability:the value of a
statistic varies in repeated random
sampling.
 Simulation, Example 9.3 p. 565




Figure 9.1 (p.566)
Sampling distribution of phat
Histogram of values of phat from 1000 SRS’s of
size 100 from a population
of .70
This is an ideal pattern
that would emerge if we
looked at all possible
samples of size 100 from
our population
Describing Sampling
Distributions




Overall shape:
symmetric/approx. normal
Outliers/deviations from
overall pattern: None
Center: close to the true
value of p
Spread:value of p-hats
have large spread because
the distribution is closer to
normal; we can therefore
use sigma to describe the
spread.
Are you a Survivor Fan?





Suppose that the true
proportion of US adults
who watched Survivor II
is p = .37. The graph
shows the results of
drawing 1000 SRSs of
size n = 100 from a
population with p = .37.
Shape:
Center:
Spread:
Outliers/Deviations:






Top: Results of drawing
1000 SRSs of size n=1000
drawn from a population
with p = .37
Bottom: Results of drawing
1000 SRSs of size n=100
drawn from a population
with p=.37
What happened when we
took n = 1000 vs. n =
100?
Center: close to .37
Spread: small; range is
.321 to .421.
Shape: hard to see, since
values of p-hat cluster so
tightly about .37
Random sampling…
…gives us regular and predictable
shapes
 …patterns of behavior over many
repetitions
 …these distributions are
approximately normal.

Unbiased Statistic


Bias: Concerns the
center of the
sampling distribution
A statistic used to
estimate a parameter
is unbiased if the
mean of its sampling
distribution is equal to
the true value of the
parameter being
estimated.
Examples of Unbiased
Estimators
If we draw an SRS from a population
in which 60% find shopping
frustrating, the mean of the
sampling distribution of p-hat is:
 If we draw an SRS from a population
in which 50% find shopping
frustrating, the mean of p-hat is:

The Stats have spoken



The approximate sampling distribution of
p-hat for samples of size 100 is close to
the normal distribution with mean .37
and std. dev. Of .05
The approximate sampling distribution of
p-hat for samples of size 1000 is close
to the normal distribution with mean .37
and std. dev. Of .01
Which is the better result?
Bulls Eye Analogy



True value of
population parameter:
bull’s-eye, sample
statistic:arrow fired at
the target
Bias: our aim is off,
we consistently miss
the bull’s-eye in the
same direction
High Variability:
repeated shots are
widely scattered on
the target
In items 1–3, classify each underlined number as a parameter or
statistic. Give the appropriate notation for each.
1. Forty-two percent of today’s 15-year-old girls will get pregnant in
their teens.
2. A 1993 survey conducted by the Richmond Times-Dispatch one
week before election day asked voters which candidate for the
state’s attorney general they would vote for. Thirty-seven percent of
the respondents said they would vote for the Democratic candidate.
On election day, 41% actually voted for the Democratic candidate.
3. The National Center for Health Statistics reports that the mean
systolic blood pressure for males 35 to 44 years of age is 128 and
the standard deviation is 15. The medical director of a large
company looks at the medical records of 72 executives in this age
group and finds that the mean systolic blood pressure for these
executives is 126.07.
Below are histograms of the values taken by three sample
statistics in several hundred samples from the same
population. The true value of the population parameter is
marked on each histogram.



4. Which statistic has the
largest bias among these
three? Justify your answer.
5. Which statistic has the
lowest variability among
these three?
6. Based on the
performance of the three
statistics in many samples,
which is preferred as an
estimate of the parameter?
Why?
Activity 9A

Groups of 4 = 1 paper, turn it in at
end of period.