Transcript Weights

Session 10
Sampling Weights:
an appreciation
1
Session Objectives

To provide you with an overview of the role of
sampling weights in estimating population parameters

To demonstrate computation of sampling weights for
a simple scenario

To highlight difficulties in calculating sampling weights
for complex survey designs and the need to seek
professional expertise for this purpose

To learn about file merging and continue with the ongoing project work
2
What are sampling weights?

Real surveys are generally multi-stage

At each stage, probabilities of selecting units at that
stage are not generally equal

When population parameters like a mean or
proportion is to be estimated, results from lower
levels need to be scaled-up from the sample to the
population

This scaling-up factor, applied to each unit in the
sample is called its sampling weight.
3
A simple example

Suppose for example, a simple random sample
of 500 HHs in a rural district (having 7349 HHs in
total) showed 140 were living below the poverty line

Hence total in population living below the poverty
line = (140/500)*7349 =2058

Data for each HH was a 0,1 variable, 1 being allocated
if HH was below poverty line.

Multiplying this variable by 7349/500=14.7 & summing
would lead to the same answer.

i.e. sampling weight for each HH = 14.7
4
Why are weights needed?

Above was a trivial example with equal
probabilities of selection

In general, units in the sample have very differing
probabilities of selection

To allow for unequal probabilities of selection, each
unit is weighted by the reciprocal of its probability of
selection

Thus sampling weight=(1/prob of selection)
5
An example

Consider a conveniently rectangular forest with
a river running down in the middle, thus dividing
the forest into Region 1 and Region 2.

Region 1 is divided into 96 strips, each 50m x 50m,
while Region 2 is divided into 72 strips.

Data are the number of small trees and the number of
large trees in each strip.

Aim: To find the total number of large trees, the total
number of small trees, and hence the total number of
trees in the forest.
6
Weights in stratified sampling

Each region can be regarded as a stratum: 8
strips were chosen from region 1 and 6 from region 2.

Mean number of large trees per strip were:
 97.875 in region 1, based on n1=8
 83.500 in region 2, based on n2=6

Hence total number of large trees in the forest can be
computed as
(96*97.875) + (72*83.5) = 15408

So what are the sampling weights used for each unit
(strip)?
7
Self-weighting

The sampling weights are the same for all strips,
whether in region 1 or region 2. Why is this?

What are the probabilities of selection here?
 In region 1, each unit is selected with prob=8/96
 In region 2, each unit is selected with prob=6/72

A design where probabilities of selection are equal for
all selected units is called a self-weighting design.

Regarding the sample as a simple random sample then
gives us the correct mean.
8
Results for means

Easy to see that the mean number of large trees
in the forest is
[(96/168)*97.875 ] + [(72/168)*83.5] = 91.71

Regarding the 14 observations as though they were
drawn as a simple random sample gives 91.71, i.e. the
same answer.

The results for variances however differ
 Variance of stratified sample mean=1.28
 Variance of mean ignoring stratification = 2.18
9
More on weights

Important to note that the weights used in
computing a mean, i.e.
 (96/168)*(1/8) = 1/14 for strips in region 1, &
 (72/168)*(1/6) = 1/14 for strips in region 2,
are not sampling weights

Sampling weights refer to the multiplying factor when
estimating a total.

Essentially they represent the number of elements in
the population that an individual sampling unit
represent.
10
Other uses of weight

Weights are also used to deal with
non-responses and missing values

If measurements on all units are not available
for some reason, may re-compute the sampling
weights to allow for this.

e.g. In conducting the Household Budget Survey
2000/2001 in Tanzania, not all rural areas planned in
the sampling scheme were visited. As a result,
sampling weights had to be re-calculated and used in
the analysis.
11
Computation of weights

General approach is to find the probability of
selecting a unit at every stage of the sample
selection process

e.g. in a 3-stage design, three set of probabilities will
result

Probability of selecting each final stage unit is then the
product of these three probabilities

The reciprocal of the above probability is then the
sampling weight
12
Difficulties in computations

Standard methods as illustrated in textbooks on
sampling, often do not apply in real surveys

Complex sampling designs are common

Computing correct probabilities of selection can then
be very challenging

Usually professional assistance is needed to determine
the correct sampling weights and to use it correctly in
the analysis
13
Software for dealing with weights

When analysing data from complex survey
designs, it is important to check that the software
can deal with sampling weights

Packages such as Stata, SAS, Epi-info have facilities
for dealing with sampling weights

However, need to be careful that the approaches used
are appropriate for your own survey design
Note: Above discussion was aimed at providing you with
an overview of sampling weights. See next slide for
work of the remainder of this session.
14
Practical work

To understand how files may be merged, work
through sections 10.5 and 10.6 of the Stata Guide.

Now move to your project work and practice file
merging to address objectives 4 and 5 of your task.

A description of the work you should undertake is
provided in the handout titled Practical 10.
15