Lecture 3c - Sampling.

Download Report

Transcript Lecture 3c - Sampling.

PPA 501 – ANALYTICAL
METHODS IN ADMINISTRATION
Lecture 3c – Sampling
INTRODUCTION – WHEN INFORMATION IS
UNAVAILABLE
If all possible information needed to solve an
administrative problem could be collected, there
would be no need for a sample.
 But, such data gathering is limited by time and
money.
 So most analysts and administrators use samples
and estimate effects based on probabilities.

REASONS FOR SAMPLING
Cost, time, accuracy, and the destructive nature
of the measurement process.
 Key concerns:







Accuracy to make the decision.
Cost of the wrong choice.
How much more information is needed.
What kinds of data and at what cost?
Affordability of extra cost.
Sampling precision.

Sampling is based on probability: What is the
probability that I will be wrong if I generalize from
the sample to the population?
REASONS FOR SAMPLING

A high degree of precision is difficult and
expensive to achieve.
A doubling in accuracy requires a four-fold increase
in sample size.
 For many social science applications, that level of
accuracy is not necessary: Tracking trends is often
equally or more important.
 The survey or research process can also begin to
change people’s reactions, answers, behaviors, and so
on.

SAMPLING METHODS
Probability or nonprobability sample.
 Single unit or cluster of units.
 Unstratified or stratified sample.
 Equal unit probability or unequal probability.
 Single stage or multi-stage sampling.

PROBABILITY OR NONPROBABILITY
SAMPLE?

A probability sample is one in which the sample
units (peoples, states, counties, etc.) are selected
at random and have an equal chance of being
selected.
Simple random samples.
 Systematic samples.

A nonprobability sample is one in which random
selection techniques are not used.
 The key difference is the generalizability of the
results to the larger population.
 The choice is usually based on cost versus value.
 Rule of thumb: the more diverse the population,
the more important representativeness becomes.

SINGLE UNIT OR CLUSTER SAMPLING?
A sampling unit is the basic element of the
population being sampled.
 In single unit sampling, each sampling unit is
selected independently.
 In cluster sampling, units are selected in groups.
 Cluster sampling reduces costs.
 However, diverse populations generate pressure
to guarantee representativeness by using single
unit sampling.

STRATIFIED OR UNSTRATIFIED SAMPLING
A sample stratum is a portion of the population
that is of interest to the researcher.
 Can be used to ensure representativeness or can
be used to ensure overrepresentation of a selected
population.

EQUAL UNIT OR UNEQUAL UNIT

In combination with stratified sampling, unequal
unit sampling can be used to ensure an
overrepresentation of a research population of
interest.
SINGLE STAGE VERSUS MULTISTAGE
SAMPLING


Used when sampling over a large geographic area.
Face-to-face surveying.






Congressional districts.
Census tracts.
Residential blocks.
Households.
Residents (most recent birthday).
Telephone interviewing.
Area codes.
Prefixes.
First two-digit clusters.
Random assignment of last two digits (unlisted).
Over-sampling to accommodate disconnects and
commercial numbers.
 Residents (most recent birthday).





SAMPLE BIAS AND SAMPLING ERROR



The ultimate purpose of sampling is to generate a
sample that accurately reflects the research relevant
characteristics of the population.
This purpose can be undermined by both sampling
and nonsampling error.
Sample bias.
Conscious or unconscious bias in the selection of the
sample.
 Overcome by random selection.


Sampling error.
No sample ever exactly matches the population, but
random sampling allows probability estimates of the
match.
 Law of large numbers versus law of diminishing returns.

NONSAMPLING ERROR

Sampling frame.

You should start with as complete a sampling frame
as possible.
Example: random digit dialing versus one-plus dialing
versus telephone directories.
 Example: residential survey versus telephone survey.


Nonresponse error.
Low response rates nearly always guarantee a biased
sample.
 Use of incentives and follow-up phone calls to reduce.
 No guarantees of reduction.

SAMPLING DISTRIBUTIONS




The sampling distribution is a hypothetical
distribution that was developed by statisticians to
allow the estimation of the probability of a match
between the sample and the population.
The sampling distribution is a distribution composed
of the means of a very large number of samples drawn
from the population.
This sample is generally normal and has a mean
equal
tos the population mean and a standard

or
n
n -equal
1
deviation
to
. This is called the
standard error of the mean.
Central limit theorem – as the number of samples
increases the distribution of the sample statistic will
take on a normal distribution. This begins to occur at
n=30.