AP Statistics Section 5.1 B More on Sampling

Download Report

Transcript AP Statistics Section 5.1 B More on Sampling

AP Statistics Section 5.1 B
More on Sampling
Methods for sampling from large
populations spread out over a wide
area are usually more complex
than an SRS.
To select a stratified random sample,
1. Divide the population into groups of
individuals (strata) that are similar in some
way important to the response variable.
2. Choose an SRS in each stratum.
3. Combine the SRSs to form a full sample.
When defining strata, be sure that each
group are likely to be similar.
Example: Define some strata for the
following populations:
high schools: public vs private: DI, DII, etc
music: jazz, country, rap
A stratified sample can give good
information about each stratum
seperately as well as about the
overall population.
If the individuals in each stratum are less
varied than the population as a whole, a
stratified sample can produce better
information about the population than an
SRS of the same size. In the extreme case,
all individuals in each stratum would be the
same, then we would need only one
individual in each stratum to completely
represent the population.
Cluster sampling involves
1. Dividing the population into groups
(clusters).
2. Randomly selecting some of the
clusters.
3. Use all individuals in the chosen clusters
to form the sample
Example: Suppose you work for the tax department for
Cuyahoga county and you want to investigate whether
waitresses/waiters are honestly claiming the tips they
make. Explain a method of creating a stratified sample
and a method of creating a cluster sample.
stratified
Group restaurants by price
range
cluster
Group restaurants by price
range
Choose an SRS from each
group of restaurants
Randomly select some of
the groups of restaurants
Combine the SRSs to make
a sample
Combine the chosen
groups to make a sample
Stratified sampling versus cluster
sampling: In stratified sampling, we
study a random sample in every
stratum. But, in cluster sampling,
we study all individuals in the
chosen cluster and none of the
individuals in other clusters.
Undercoverage occurs when some groups
in the population are left out of the
process used to choose a sample.
Example: What people would be left out in each
of the following surveys?
1. Surveys conducted by randomly selecting
telephone numbers.
people without telephones
2. Surveys conducted by randomly selecting
households.
homeless, college students
Nonresponse occurs when an individual chosen
for a sample can’t be contacted or does not
respond.
Nonresponse often reaches 30% or
more, even with careful planning
and several callbacks. A research
center found out of 2879
households called, 1658 were
never at home, refused or would
not finish the interview. That’s a
nonresponse rate of 58%.
There are other details that can
affect the results.
The behavior of the respondent or of the
interviewer can cause response bias in
sample results. Respondents may lie,
especially about illegal or unpopular
behavior. The race or sex of the interviewer
can influence responses to questions about
race relations or attitudes toward
feminism. Answers to questions that ask
respondents to recall past events are often
inaccurate because of faulty memory.
Example: One of the most frequently observed
survey measurement errors is the over reporting
of voting behavior. In a typical sample of 663
people after an election, 478 people (72%) said
that they voted, but only 371 people (56%)
actually did.
The wording of questions is the
most important influence on the
answers given to a sample survey.
Confusing or leading questions can
introduce strong bias and even
minor changes in wording can
change a survey’s outcome.
Example: A survey conducted in 1992
for the American Jewish Committee
asked the following question: Does it
seem possible or does it seem
impossible to you that the Nazi
extermination of the Jews never
happened?
22% of the sample said “possible”.
When considering reports based
on surveys of large human
populations, insist on knowing the
exact questions asked, the rate of
nonresponse, and the date and
method of the survey before you
trust a poll result.
You can get more precise results from surveys by
using ______
larger random samples. The Current
Population Survey’s sample of 50,000
households estimates the national
unemployment rate very accurately whereas
Nightline’s voluntary response sample of
186,000 people is worthless. Using a probability
sampling method and taking care to deal with
practical difficulties reduce bias in a sample.