Sampling distribution

Download Report

Transcript Sampling distribution

9-1:Sampling Distributions 
Preparing for Inference!
Parameter: A number that describes
the population (usually not known)
 Statistic: A number that can be
computed from the sample data
without making use of any unknown
parameters.

Symbols:
Example

Sample surveys show that fewer people
enjoy shopping than in the past. A recent
survey asked a nationwide random
sample of 2500 adults if they agreed or
disagreed that “I like buying new clothes
but shopping is often frustrating and
time-consuming.” Of the respondents,
1650, or 66%, said they agreed.
Example cont’d:
p-hat = 66% = statistic = sample
proportion
 Population = what we want to draw
conclusions about = all US residents
>18 yrs old
 Parameter = % of all adult US
residents who agreed

Sampling Variability
Sampling Variability:the value of a
statistic varies in repeated random
sampling.
 Simulation, Example 9.3 p. 565

Simulation,
Example 9.3 p. 565



Figure 9.1 (p.566)
Sampling distribution of phat
Histogram of values of phat from 1000 SRS’s of
size 100 from a population
of .70
This is an ideal pattern
that would emerge if we
looked at all possible
samples of size 100 from
our population
Describing Sampling
Distributions




Overall shape:
symmetric/approx. normal
Outliers/deviations from
overall pattern: None
Center: close to the true
value of p
Spread: value of p-hats
have large spread, but
because the distribution is
closer to normal, we can
therefore use sigma to
describe the spread.
Are you a Survivor Fan?





Suppose that the true
proportion of US adults who
watched Survivor II is p =
.37. The graph shows the
results of drawing 1000
SRSs of size n = 100 from a
population with p = .37.
Shape:
Center:
Spread:
Outliers/Deviations:
Top: Results of drawing
1000 SRSs of size n=1000
drawn from a population
with p = .37
 Bottom: Results of drawing
1000 SRSs of size n=100
drawn from a population
with p=.37
 What happened when we
took n = 1000 vs. n = 100?
Notes on top picture:
 Center: close to .37
 Spread: small; range is .321
to .421.
 Shape: hard to see, since
values of p-hat cluster so
tightly about .37

Random sampling…
…gives us regular and predictable
shapes
 …patterns of behavior over many
repetitions
 …these distributions are
approximately normal.

Unbiased Statistic


Bias: Concerns the
center of the
sampling distribution
A statistic used to
estimate a parameter
is unbiased if the
mean of its sampling
distribution is equal to
the true value of the
parameter being
estimated.
Examples of Unbiased
Estimators

If we draw an SRS from a population
in which 60% find shopping
frustrating, the mean of the sampling
distribution of p-hat is:

If we draw an SRS from a population
in which 50% find shopping
frustrating, the mean of p-hat is:
Variability of a statistic…
As long as the
candy is well
mixed (it selects a
random sample),
the variability of
the result
depends only on
the size of the
scoop and not the
size of the
container.
Goal: low bias, low variability 
Take random samples with big n!
Bulls Eye Analogy



True value of
population parameter:
bull’s-eye, sample
statistic: arrow fired at
the target
Bias: our aim is off, we
consistently miss the
bull’s-eye in the same
direction
High Variability:
repeated shots are
widely scattered on
the target
1.
In items 1–3, classify each underlined number as a parameter or
statistic. Give the appropriate notation for each.
Forty-two percent of today’s 15-year-old girls will get pregnant in
their teens.
2.
A 1993 survey conducted by the Richmond Times-Dispatch one
week before election day asked voters which candidate for the
state’s attorney general they would vote for. Thirty-seven percent
of the respondents said they would vote for the Democratic
candidate. On election day, 41% actually voted for the Democratic
candidate.
3.
The National Center for Health Statistics reports that the mean
systolic blood pressure for males 35 to 44 years of age is 128 and
the standard deviation is 15. The medical director of a large
company looks at the medical records of 72 executives in this age
group and finds that the mean systolic blood pressure for these
executives is 126.07.
Below are histograms of the values taken by three sample
statistics in several hundred samples from the same
population. The true value of the population parameter is
marked on each histogram.
4. Which statistic has the largest
bias among these three? Justify
your answer.
5. Which statistic has the lowest
variability among these three?
6. Based on the performance of the
three statistics in many samples,
which is preferred as an
estimate of the parameter? Why?