Indirect Sampling

Download Report

Transcript Indirect Sampling

Indirect Sampling
Jerilyn Boykin and Zhongxue Chen
Indirect Sampling



Introduction: What is indirect sampling?
Generalized Weight Sharing: A Unified
Method
Some specific cases



Cross Sectional Estimation (Ernst, 1989)
Multiplicity Estimation (Sirken, 1970)
Frames Containing Unknown Amount of Duplicity
(Rao,1968)
Indirect Sampling

Why Indirect Sampling




B
U
Population
A
U
Population
frame is not available;
frame is available;
There is a relationship (link) between these two
populations
The Generalized Weight Sharing Method is a
unified method indirect sampling developed by
Lavallee (2002)
GWSM: Notation





UA
UB
Population:
A
N
NB
Number of units:
Label of units:
j
i
A
B
s
s
Selected sample:
Link Matrix  AB  [ jiAB ] N
A
N B
 jiAB  0
if unit j of U A is related to unit i of U B
 jiAB  0
if unit j of U A is not related to unit i of U B
GWSM: Sampling

A
s
Sample
from U A
A


j  0



is the selection probability
A
s
For each j in
, identify the units i in
AB

such that ji  0
B
B
A
AB
s

{
i

U
;

j

s


ji  0}
The set
Want to estimate
NB
T B   yi
i 1
UB
GWSM: Estimation
NA



AB
 AB
i    ji
Let
~ AB
And  ji   jiAB /  ABi
Then T   y   (
j 1
NB
NB
N A ~ AB
B
i 1

i
i 1
j 1
N B N A ~ AB

yi
ji
i 1 j 1
N A N B ~ AB
  
ji
j 1 i 1
NA
N B ~ AB
  ( 
j 1
yi
i 1
ji
yi )
ji
) yi
GWSM: Estimation

Horvitz-Thompson:
1 if j  s A
tj  
0 otherwise

Let

HT-estimator:

B
T

N
A

A
j
j 1

N
A
NB
tj
NB
( 
tj

NB
w
i 1
i
ji
i 1
j 1 i 1

~ AB
yi
A
j
~ AB

ji
yi )
yi 
NB
N
A
tj

i 1
j 1
A
j
~ AB

ji
yi
GWSM: Variance

Variance estimation:
 Let
Z
 
N

Then
T
~ AB
ji
j

B
B
i 1

N
A
t

j 1
j
A
j
yi
Z
j
 jkA   jA kA
V (T )  
Z jZk
A
A
 j k
j 1 k 1

B
NA NA

Variance:

Where  jkA is the second order inclusion probabilit y.
GWSM: Variance Estimation

Variance estimation (Horvitz-Thompson):


B
 jkA   jA kA
V (T )  
Z j Z k t j tk
A
A
 j k
j 1 k 1
NA NA
Specific Examples



Monroe G. Sirken 1970, Multiplicity
Estimation
J.N.K Rao 1968, Sampling a Frame with an
Unknown Amount of Duplicity
L.R. Ernst, Longitudinal Household Surveys
Household Surveys with Multiplicity
(Sirken, 1970)



Estimate the number of individuals in
population with certain attribute
Complete frame is not available
Sample households report information about
their own residents as well as others persons
who live elsewhere


Relatives
Neighbors
Multiplicity Rule



Other persons are specified by a “multiplicity
rule” adopted in the survey
Example: “siblings report each other”
Total number of households in population
reporting an individual is referred to as their
multiplicity

Multiplicity of a person is number of different
households in which he or one of his siblings is a
resident.
Some Notation….
Some Notation….



Consider the conventional survey indicator
variable:
1 if I  is a resident of H i
v ,i  
0 otherwise

Some Notation….





Consider the conventional survey indicator
variable:
1 if I  is a resident of H i
v ,i  
0 otherwise

 ,i
1 I  not a resident but reported by H i

0 otherwise
Some Notation….

Number of individuals reported by
Hi
in the conventional survey
N
 i   v ,i
 1

Weighted number of individuals reported by
1
 ,i  v ,i 
 1 s
N

multiplicity is
Hi
in the survey with
ti  
s    ,i  v ,i 
M

where

or the multiplicity of
i 1
I
is the number of households reporting
I
Some Notation….

Notice the variate t i based on multiplicity
survey requires the multiplicity s of every
individual reported by household, H i .
The Estimators

Assume a sample of
replacement, then
M
N  
m


m

i 1
i
is the estimate of
survey, and
M
N t 
m
m households without
N derived from the conventional
m
t
i 1
i
is the estimate from the survey with multiplicity.
Variance

The variances of N  and N t  are,
2
M m M2
M

m
M
Var( N  ) 
Var( ) and Var( N t ) 
Var(t )
M 1 m
M 1 m

It follows that Var( N t )  Var( N  )(1   ) where
Var ( N   )  Var ( N t  )

Var( N   )

is the relative gain in sampling efficiency resulting
from the survey with multiplicity.
Surveys with Multiplicity


Household surveys w/ multiplicity are
applicable whenever multiplicity rules can be
devised that produce estimates having
smaller MSE’s than those from conventional
surveys
Non-sampling error may be a problem
Sampling Theory When Frame Contains
Unknown Amount of Duplicity (Rao,
1968)




Arose in connection with a sample survey of
beef cattle producers
Beef cattle producing operation which could
be operated by individual or partnership
Frame was not available
Frame of list of addresses of individuals
believed to be beef cattle producers
Rao, 1968



Questionnaire mailed to random sample of
addresses then to a random sub-sample of
non-respondents
Respondents identified as partnerships were
asked to give names and addresses of
partners and only complete 1 questionnaire
for the partnership
Names and addresses were used to
determine the number of times an operation
was in list frame
Some Notation…



n1
is number of names in sample that
respond to mail questionnaire
n2 is number in nonresponse group
Data are obtained by direct interview for
random subsample r2 of nonrespondents
Some Notation…




M is unknown number of beef operations
covered by list frame
Y is population total of a character
attached to beef operations
yi is the total attached to the i th operation
N is the number of addresses on the list
th
frame and ai ( 1) is the number of times the i
operation is listed on the list frame.
Some Notation…

Let y j and a j denote the y  value and the a  value
of the sample operation contactable via the
th
j sample address
The Estimator

Using the Hansen-Hurwitz estimator,
n1

n2
N
ˆ
X   x j 
n1
r2
N
y j
M

1 x j  and the fact that

r2
yi
  ai  Y

ai
j 1 a j 
i 1
an unbiased estimate of
can be obtained
The Estimator

The unbiased estimator of Y is
N
ˆ
Y
n

 n1 y j n2


 1 a j r2
y j  N  v1 yi n2
1 a   n 1 ti a  r
j 
i
2


r2
yi 
1 t a 
i 
v2
*
i
where v1 and v2 denote the number of
distinct operations in the sample and subsample
ti and ti * are the number of times the
operation appears in the sample and subsample
Variance

The variance for the estimator with multiplicity
is given by,
2

2
N
N
 y j  
N
(
N

n
)
y
1
j
 2     
V (Yˆ ) 
n( N  1)  1 a j N  1 a j  


2

N2
N
2
2
NN 2 (k  1)
y j 1  y j  
 2 





n( N 2  1)  1 a j N 2  1 a j  


The Estimator



Estimators that do not depend on ti and ti *
may be obtained
Concept of sufficiency in sampling theory
Very cumbersome for moderate to large
sample sizes
Cross Sectional Estimation from
Longitudinal Household Survey, Ernst
(1989)



What happens to households and families
over time
Composition of households and families can
change over time
What weighting procedures should be used
to obtain unbiased estimates
Ernst (1989)






Take a month to be a basic unit of time
CT denotes a cross sectional universe of
households
PT is set of units residing in a household in CT
Several rounds of interviews, at each month T
or interval of months
Initial sample is taken at month B
Final interview month for sample panel is
month E
Ernst (1989)



Individual in a chosen household at month B
is an “original sample person”
For each month all original sample people in PT
plus all other people residing with original
sample person
Latter group of people are “associated
sample people”
Longitudinal Household (LHH)

Each LHH is of the form
LHH  Ab  Ab1 ,......., Ae 


where AB is a given household at month B
Has two part definition
 For any At specify which At 1 if any can be in the

same LHH
What kind of LLH’s can exist in L
LHH

This paper considers the restriction that L
consists of a cohort of LHH’s


existence at month , the initial LHH’s,
LHH formed after month, those generated by
initial LHH’s
Obtaining Weights
N

Let X   xi be the parameter of interest
i 1


th
The i unit has a known positive probability of
being chosen
N
X would be estimated by Xˆ   wi x i ,where
1
th
if
the
i
unit is in sample ,

wi   pi
0 otherwise.

i 1
Obtaining Weights



Subsequent LHH’s would only be in sample if at
least one household member is an original sample
person
To use regular estimator we need to know those
probabilities
“Operationally impossible” to determine this
probability


Determine 1st round HH for each member of current HH
Compute probability at least one 1st round HH was selected
Obtaining Weights




In order for estimator to be unbiased it is only
necessary that E ( wi )  1
Let M be the individuals in At
th
Let p j denote the probability that the j
individual’s household is in sample at month B
Their associated weight is
1
 p if individual s HH is in sample at month B,
w' j   j
0 otherwise.

Obtaining Weights


For the ith LHH associate a set of constants  ij
independent of w j and j  ij  1
The weight of the ith LHH is
wi   ij wj
j
References





Lavallee, P and Caron, P. Estimation using the generalised
weight share method: the case of record linkage.
http://www.statcan.ca/english/ads/12-001XXPB/pdf/27_2_lavallee_e.pdf
Jean-Claude Deville and Myriam Maumy . A new survey
methodology for describing tourism activities and expanses
http://www.tourismforum.scb.se/papers/PapersSelected/CS/Pape
r33FRANCE/Deville_Maumy_article.pdf
Ernst L.R. (1989) Weighting issues for longitudinal household
ans family estimates. In Panel Surveys
Rao, J.N.K(1968). Some nonresponse sampling theory when the
frame contains unknown amount of duplication. Journal of
American Statistical Association
Sirken M.G. (1970). Household surveys with multiplicity. Journal
of the American Statistical Association