Transcript Weights
Session 10
Sampling Weights:
an appreciation
1
Session Objectives
To provide you with an overview of the role of
sampling weights in estimating population parameters
To demonstrate computation of sampling weights for
a simple scenario
To highlight difficulties in calculating sampling weights
for complex survey designs and the need to seek
professional expertise for this purpose
To learn about file merging and continue with the ongoing project work
2
What are sampling weights?
Real surveys are generally multi-stage
At each stage, probabilities of selecting units at that
stage are not generally equal
When population parameters like a mean or
proportion is to be estimated, results from lower
levels need to be scaled-up from the sample to the
population
This scaling-up factor, applied to each unit in the
sample is called its sampling weight.
3
A simple example
Suppose for example, a simple random sample
of 500 HHs in a rural district (having 7349 HHs in
total) showed 140 were living below the poverty line
Hence total in population living below the poverty
line = (140/500)*7349 =2058
Data for each HH was a 0,1 variable, 1 being allocated
if HH was below poverty line.
Multiplying this variable by 7349/500=14.7 & summing
would lead to the same answer.
i.e. sampling weight for each HH = 14.7
4
Why are weights needed?
Above was a trivial example with equal
probabilities of selection
In general, units in the sample have very differing
probabilities of selection
To allow for unequal probabilities of selection, each
unit is weighted by the reciprocal of its probability of
selection
Thus sampling weight=(1/prob of selection)
5
An example
Consider a conveniently rectangular forest with
a river running down in the middle, thus dividing
the forest into Region 1 and Region 2.
Region 1 is divided into 96 strips, each 50m x 50m,
while Region 2 is divided into 72 strips.
Data are the number of small trees and the number of
large trees in each strip.
Aim: To find the total number of large trees, the total
number of small trees, and hence the total number of
trees in the forest.
6
Weights in stratified sampling
Each region can be regarded as a stratum: 8
strips were chosen from region 1 and 6 from region 2.
Mean number of large trees per strip were:
97.875 in region 1, based on n1=8
83.500 in region 2, based on n2=6
Hence total number of large trees in the forest can be
computed as
(96*97.875) + (72*83.5) = 15408
So what are the sampling weights used for each unit
(strip)?
7
Self-weighting
The sampling weights are the same for all strips,
whether in region 1 or region 2. Why is this?
What are the probabilities of selection here?
In region 1, each unit is selected with prob=8/96
In region 2, each unit is selected with prob=6/72
A design where probabilities of selection are equal for
all selected units is called a self-weighting design.
Regarding the sample as a simple random sample then
gives us the correct mean.
8
Results for means
Easy to see that the mean number of large trees
in the forest is
[(96/168)*97.875 ] + [(72/168)*83.5] = 91.71
Regarding the 14 observations as though they were
drawn as a simple random sample gives 91.71, i.e. the
same answer.
The results for variances however differ
Variance of stratified sample mean=1.28
Variance of mean ignoring stratification = 2.18
9
More on weights
Important to note that the weights used in
computing a mean, i.e.
(96/168)*(1/8) = 1/14 for strips in region 1, &
(72/168)*(1/6) = 1/14 for strips in region 2,
are not sampling weights
Sampling weights refer to the multiplying factor when
estimating a total.
Essentially they represent the number of elements in
the population that an individual sampling unit
represent.
10
Other uses of weight
Weights are also used to deal with
non-responses and missing values
If measurements on all units are not available
for some reason, may re-compute the sampling
weights to allow for this.
e.g. In conducting the Household Budget Survey
2000/2001 in Tanzania, not all rural areas planned in
the sampling scheme were visited. As a result,
sampling weights had to be re-calculated and used in
the analysis.
11
Computation of weights
General approach is to find the probability of
selecting a unit at every stage of the sample
selection process
e.g. in a 3-stage design, three set of probabilities will
result
Probability of selecting each final stage unit is then the
product of these three probabilities
The reciprocal of the above probability is then the
sampling weight
12
Difficulties in computations
Standard methods as illustrated in textbooks on
sampling, often do not apply in real surveys
Complex sampling designs are common
Computing correct probabilities of selection can then
be very challenging
Usually professional assistance is needed to determine
the correct sampling weights and to use it correctly in
the analysis
13
Software for dealing with weights
When analysing data from complex survey
designs, it is important to check that the software
can deal with sampling weights
Packages such as Stata, SAS, Epi-info have facilities
for dealing with sampling weights
However, need to be careful that the approaches used
are appropriate for your own survey design
Note: Above discussion was aimed at providing you with
an overview of sampling weights. See next slide for
work of the remainder of this session.
14
Practical work
To understand how files may be merged, work
through sections 10.5 and 10.6 of the Stata Guide.
Now move to your project work and practice file
merging to address objectives 4 and 5 of your task.
A description of the work you should undertake is
provided in the handout titled Practical 10.
15