STRATIFIED SAMPLING - National Institute of Statistical

Download Report

Transcript STRATIFIED SAMPLING - National Institute of Statistical

HOW TO CHOOSE THE NUMBER
OF CALL ATTEMPTS IN A
TELEPHONE SURVEY
IN THE PRESENCE OF NONRESPONSE
AND MEASUREMENT ERRORS
Annica Isaksson
Linköping University, Sweden
Peter Lundquist
Statistics Sweden
Daniel Thorburn
Stockholm University,
Sweden
~ Draft version ~
1
The Problem
Consider a telephone survey of individuals, in
which a maximum number A of call attempts is
to be made to sampled individuals.
HOW SHALL A BE CHOSEN?
Part of a larger problem of designing efficient
call scheduling algorithms.
~ Draft version ~
2
Prerequisites



Single-occasion survey
Direct sampling from a frame with good
population coverage
Estimation of a population total t  by the
direct weighting estimator
ˆt ycA*  r  A 
yk
 kˆ A 
Observed value for
individual k (proxy for
the true value µk)
ks
Response set after
A call attempts
Inclusion probability
for individual k
~ Draft version ~
Estimated response
probability for individual k
after A call attempts
3
The Survey as a Three-Stage Process



Stage 1: Sample selection
Stage 2: Contact and response Maximally A
call attempts are made. Individuals respond in
accordance with an unknown response
distribution.
Stage 3: Measurement Observed values are
related to the true values according to a
measurement error model.
~ Draft version ~
4
Response Model
 The sample can be divided into Hs
k A
s
response homogeneity groups (RHG)
such that, for all A, given the sample,



all individuals within the same group have the
same probability of responding
individuals respond independently of each
other
individuals respond independently of each
other after different numbers of call attempts
~ Draft version ~
5
Measurement Error Model
For an individual k in RHG h, given the sample
and that the individual responds at call attempt a,
Indicates if individual k
responds at attempt a=ak
yk  k 
A random interviewer
effect with expectation 0 and
variance  b2a  b
A
 v k ,a (bi ( k ),a   k ,a )
a 1
True value for individual k
~ Draft version ~
A random response error
with expectation 0 and
variance  2a  
6
Bias and Variance
 H sBias only ifS the
 A RHG
  model does not hold:
  k  n h  1   k   Es hp 
sh
Sample
between

Acovariance


h 1

response
probabilities and
s

 values
 h weighted true
design
S  A  
 Hs

s h

B( A )  E p   nh  1

A 
h 1
s


h
Average response probability
within RHG
The variance, V(A), is derived in the paper.
~ Draft version ~
7
Cost Function
C  C 0  C  A
where C  A  is composed of…



Starting costs (tracking, letter of introduction…)
Contact costs (making calls without an answer,
talking to other individuals than the one selected,
booking an interview for another time…)
Interview costs (interviewing, editing…)
All costs are assumed to be constant within RHG.
~ Draft version ~
8
Choosing the Optimum A
Consider one RHG h. The optimum number of
call attempts is the number Ah that gives the lowest
value on the function
Enh 
B ( Ah )( C  C 0 )
 V ( Ah )EC ( Ah )
n
2
where C ( Ah ) is the marginal cost for RHG h.
~ Draft version ~
9
A Case Study: the Swedish LFS




Target population: Swedish residents 15-74 years
old
Frame: the Swedish Population Register
Monthly panel survey of ~21,500 individuals. An
individual is observed every quarter for two years.
Stratified SRS with
stratification by gender, age
and county (144 strata in all)
Data collected by telephone
.

~ Draft version ~
10
Our Data
LFS data from March-Dec. 2007, supplemented
with:
 Annual salary 2006 according to the Swedish
Tax Register (our y)
 Process data from WinDati (WD)
Note: we do not know the number of call
attempts, only the number of ‘WD events’
.
~ Draft version ~
11
Data Processing and Estimation



Reduced target population: Swedish residents
16-64 years old
Each monthly sample viewed as a SRS
Process data are used to estimate:
 Marginal costs
 Response and contact probabilities
.
~ Draft version ~
12
Measurement Error Model Parameters
Estimated by 10-monthaverage sample variance
 y (ICC)
2
b  
.002 (”low”)
S U
55,267,619,616
110,979,155
.040 (”high”)
55,267,619,616
2,402,939,983
Biemer and Trewin (1997):
.
2
b
y  2
S U   b2   2
~ Draft version ~
 b  1; 0; 1
  0
13
Illustrations






One RHG (women), one ICC level (low)
Unbiased or biased estimator of t  = total
annual salary 2006
Three curves representing different values on  b
One curve for no measurement errors
Each curve represents a 10-month-average
The optimum A (optimum number of WD
events) is the one for which the curve is at its
minimum
~ Draft version ~
14
No Bias, Low ICC
mopt_PRD
mopt_1_lo
mopt0_lo
mopt1_lo
1
6
11
16
21
26
31
# of WD Events
~ Draft version ~
15
Bias, Low ICC
mopt_PRD
mopt_1_lo
mopt0_lo
mopt1_lo
1
6
11
16
21
26
31
# of WD Events
~ Draft version ~
16