Types of non-probability sampling
Download
Report
Transcript Types of non-probability sampling
Investigating the Potential of
Using Non-Probability Samples
Debbie Cooper, ONS
Summary
•
•
•
•
•
•
•
•
Project aims
What constitutes non-probability sampling?
Advantages of probability sampling
Growing interest in non-probability sampling
Types of non-probability sampling
Key challenges
Overcoming challenges
When is the use of non-probability sampling
justified?
• Recommendations
Project aims
1. Provide a concise review of the types of
non-probability samples
2. Highlight the key challenges associated with
non-probability sampling
3. Increase awareness of techniques available
to potentially overcome these challenges
4. Provide guidance to help inform decisionmaking on whether a non-probability sample
is justified
What constitutes non-probability
sampling?
• Non-probability sampling has two
distinguishing characteristics:
one cannot specify the probability of selection for
each unit that will be included in the sample
it is not possible to ensure that every unit in the
population has a nonzero probability of inclusion
(Frankfort-Nachmias and Nachmias, 1996)
Advantages of probability sampling
• the ability to calculate selection probabilities allows
researchers to create design weights which result in
an unbiased estimator
• allows for representativeness as each unit in the
target population has a nonzero probability of
selection
• allows for the estimation of sampling variability
BUT
• non-random nonresponse and undercoverage violate
the assumptions of probability sampling, giving them
a non-probability element
Having said that.....
• Methods developed to deal with coverage
and nonresponse issues in probability
sampling:
using multiple sampling frames
adjusting weights for nonresponse and, if relevant,
attrition
calibrating weights to population totals
Why is there a growing interest in nonprobability sampling
• concerns about increasing nonresponse rates
• high costs associated with probability
sampling
• ease of carrying out web surveys
• sometimes no other option available (e.g.
hidden population)
Types of non-probability sampling
• Convenience/accidental sampling
• Purposive sampling
• Sample matching
• Chain referral methods
Convenience/accidental sampling
“Convenience sampling is a form of non-probability
sampling in which the ease with which potential
participants can be located or recruited is the primary
consideration.”
(Baker et al., 2013)
Types of convenience sampling:
mall-intercept sampling
volunteer sampling
Purposive sampling
• Consists of the researcher approaching
people who they decide are most appropriate
to participate in the study e.g. a sample of
experts on a particular topic
Sample matching
• involves selecting a sample that matches a
set of population characteristics of interest
• most common type of sample matching is
quota sampling
• good estimates of the population
characteristics used for matching need to be
available
Chain referral methods
• Tend to be used for researching rare or hardto-reach populations.
Types of chain referral methods:
Snowball sampling
Respondent-driven sampling (RDS)
Non-probability sampling: key
challenges
1. Greater likelihood of selection bias
2. Impossible to utilise unbiased estimators and
associated quality measures
1. Selection bias
“The error introduced when the study population
does not represent the target population”
(Delgado-Rodriguez and Llorca, 2004)
Some causes:
undercoverage
volunteer bias
interviewer/researcher unconscious bias
2. Unbiased estimators and associated
quality measures
• Standard practice in official statistics is to use
probability sampling and design-based
estimation
• Advantages:
resulting estimator is unbiased
sampling variability can be estimated directly
These advantages are lost when non-probability
samples are used.
Overcoming challenges at:
1. Sampling stage
2. Weighting and estimation stage
1. Overcoming challenges at sampling
stage
• Main challenge at this stage is obtaining a
representative sample.
• Two popular non-probability sampling
strategies developed to obtain a
representative sample are:
Sample matching (quota sampling)
Respondent-Driven Sampling
Sample matching (quota sampling)
• Aim is to obtain responses from a specific
number of units that match the target
population
• Quota sampling:
interviewers asked to interview a certain number
of people with particular characteristics
final sample ‘mirrors’ the target population in
terms of these characteristics
Sample matching problems
• Choice of who to interview still in the hands of
the interviewer (unconscious bias)
• Undercoverage
• Nonresponse
• Therefore, recommended to use sample
matching along with weighting
Respondent-driven sampling (RDS)
• Used for hidden target populations
• Two distinct phases:
1. Initial sample (the seeds at Wave 0) selected
using convenience sampling
2. Rest of the sample is selected by following links
from previous respondents
2. Overcoming challenges at weighting
and estimation stage
• Different methods developed for use with the
various sampling techniques.
Estimators and quality measures when using RDS
Using weighting
Additional quality measures
Estimation and quality measures for
use with RDS
“For many years, researchers thought it was
impossible to make unbiased estimates from
this type of sample. However, it was recently
shown that if certain conditions are met and if
the appropriate procedures are used, then
the prevalence estimates from respondentdriven sampling are asymptotically unbiased.”
(Salganik, 2006)
Estimation and quality measures for
use with RDS
• Heckathorn (2011) – describes various
estimators developed for use with RDS
Caution: these estimators require a number of
assumptions to be made and biased estimates
may result if assumptions are not met
• Chow and Thompson (2003) proposed a
Bayesian approach for estimation for use with
link-tracing designs
Estimation and quality measures for
use with RDS
• Bootstrap method to construct confidence
intervals around estimates (Salganik, 2006)
• Recommends sample size twice as large as
that required under SRS
• Interval estimates when using Bayesian
approach proposed by Chow and Thompson
(2003)
Using weighting with non-probability
sampling
• Propensity-score adjustments (PSA) to
approximate design-based approach
• Valliant and Dever (2011) approach:
construct pseudo design weights
use covariates from reference survey to adjust
these design weights
Caution: Lee (2006) found that although PSA
tends to reduce nonresponse bias it seems to
increase variance
Caution: Lee (2006) recommends that covariates
highly related to study outcomes should be used
in the PSA
Additional quality measures
• Credibility Intervals – popular with opt-in
panels
• Participation rates
Response rate =
no. of respondents
total no. of eligible units
Participation rate = no. with useable response
total no. of initial invitations
Quality measures for non-probability
sampling
• Essential for researchers to report on the
quality of their estimates
• Currently no widely acceptable framework for
assessing quality of estimates from nonprobability samples – development needed
• Important to use different terminology to that
used for probability sampling
When is use of non-probability
sampling justified?
• Fitness for purpose
• ‘Modellers’ vs ‘Describers’
• No single correct approach
• Decision boils down to desired outcomes and
resources
• Communicate clearly the quality of estimates
and issues/limitations
Recommendations
• Fitness for purpose should be used to drive
survey design
• Non-probability sampling does not
necessarily equate to lack of quality
• Transparency is essential
Questions?
[email protected]