Sampling weights: an appreciation
Download
Report
Transcript Sampling weights: an appreciation
Sampling weights:
an appreciation
(Sessions 19)
SADC Course in Statistics
Learning Objectives
By the end of this session, you will be able to
• explain the role of sampling weights in
estimating population parameters
• calculate sampling weights for very simple
sampling designs
• appreciate that calculating sampling
weights for complex survey designs is nontrivial and requires professional expertise
To put your footer here go to View > Header and Footer
2
What is meant by sampling weights?
• Real surveys are generally multi-stage
• At each stage, probabilities of selecting
units at that stage are not generally equal
• When population parameters like a mean
or proportion is to be estimated, results
from lower levels need to be scaled-up
from the sample to the population
• This scaling-up factor, applied to each unit
in the sample is called its sampling weight.
To put your footer here go to View > Header and Footer
3
A simple example
• Suppose for example, a simple random sample
of 500 HHs in a rural district (having 7349 HHs
in total) showed 140 were living below the
poverty line
• Hence total in population living below the
poverty line = (140/500)*7349 =2058
• Data for each HH was a 0,1 variable, 1 being
allocated if HH was below poverty line.
• Multiplying this variable by 7349/500=14.7 &
summing would lead to the same answer.
• i.e. sampling weight for each HH = 14.7
To put your footer here go to View > Header and Footer
4
Why are weights needed?
• Above was a trivial example with equal
probabilities of selection
• In general, units in the sample have very
differing probabilities of selection, i.e. rare to
get a self-weighting design
• To allow for unequal probabilities of
selection, each unit is weighted by the
reciprocal of its probability of selection
• Thus sampling weight=(1/prob of selection)
To put your footer here go to View > Header and Footer
5
Weights in stratified sampling
• Consider “To the Woods” example data set
discussed in Session 10.
• Mean number of large trees were:
– 97.875 in region 1, based on n1=8
– 83.500 in region 2, based on n2=6
• Hence total number of large trees in the
forest can be computed as
(96*97.875) + (72*83.5) = 15408
• So what are the sampling weights used for
each unit (plot)?
To put your footer here go to View > Header and Footer
6
Self-weighting again
• The sampling weights are the same for all
plots, whether in region 1 or region 2. Why
is this?
• What are the probabilities of selection here?
– In region 1, each unit is selected with prob=8/96
– In region 2, each unit is selected with prob=6/72
• Recall that a design where probabilities of
selection are equal for all selected units is
called a self-weighting design.
• So regarding the sample as a simple random
sample should give us the correct mean.
To put your footer here go to View > Header and Footer
7
Results for means
• The mean number of large trees, using the
formula for stratified sampling, gives
[(96/168)*97.875 ] + [(72/168)*83.5]
= 91.71
• Regarding the 14 observations pretending
they were drawn as a simple random sample
gives 91.71 as the answer.
• The results for variances however differ
– Variance of stratified sample mean=1.28
– Variance of mean ignoring stratification = 2.18
To put your footer here go to View > Header and Footer
8
Results for means
• Important to note that the weights used in
computing a mean, i.e.
– (96/168)*(1/8) = 1/14 for plots in region 1, &
– (72/168)*(1/6) = 1/14 for plots in region 2,
are not sampling weights
• Sampling weights refer to the multiplying
factor when estimating a total.
• Essentially they represent the number of
elements in the population that an individual
sampling unit represent.
To put your footer here go to View > Header and Footer
9
Other uses of weight
• Weights are also used to deal with nonresponses and missing values
• If measurements on all units are not
available for some reason, may re-compute
the sampling weights to allow for this.
• e.g. In conducting the Household Budget
Survey 2000/2001 in Tanzania, not all rural
areas planned in the sampling scheme were
visited. As a result, sampling weights had to
be re-calculated and used in the analysis.
To put your footer here go to View > Header and Footer
10
Computation of weights
• General approach is to find the probability of
selecting a unit at every stage of the sample
selection process
• e.g. in a 3-stage design, three set of
probabilities will result
• Probability of selecting each final stage unit
is then the product of these three
probabilities
• The reciprocal of the above probability is
then the sampling weight
To put your footer here go to View > Header and Footer
11
Difficulties in computations
• Standard methods as illustrated in textbooks
on sampling, often do not apply in real
surveys
• Complex sampling designs are common
• Computing correct probabilities of selection
can then be very challenging
• Usually professional assistance is needed to
determine the correct sampling weights and
to use in correctly in the analysis
To put your footer here go to View > Header and Footer
12
Software for dealing with weights
• When analysing data from complex survey
designs, it is important to check that the
software can deal with sampling weights
• Packages such as Stata, SAS, Epi-info have
facilities for dealing with sampling weights
• However, need to be careful that the
approaches used are appropriate for your
own survey design
To put your footer here go to View > Header and Footer
13
References
• Brogan, D. (2004) Sampling error estimation for
survey data. Chapter XII, pp.447-490, of the UN
Publication An Analysis of Operating
Characteristics of Household Surveys in
Developing and Transition Countries: Survey
Costs, Design Effects and Non-Sampling Errors.
Available at
http://unstats.un.org/unsd/hhsurveys/index.htm.
(accessed 10th September 2007)
• Lohr, S.L. (1999) Sampling: Design and Analysis.
International Thomson Publishing. ISBN 0-53435361-4
• Rao, P.S.R.S. (2000) Sampling Methodologies:
with applications. Chapman and Hall, London.
To put your footer here go to View > Header and Footer
14
To put your footer here go to View > Header and Footer
15