STRATIFIED SAMPLING

Download Report

Transcript STRATIFIED SAMPLING

USE OF PROCESS DATA TO
DETERMINE THE NUMBER OF
CALL ATTEMPTS IN A
TELEPHONE SURVEY
Annica Isaksson
Linköping University, Sweden
Peter Lundquist
Statistics Sweden
Daniel Thorburn
Stockholm University,
Sweden
Q2008
1
The Problem
Consider a telephone survey of individuals, in
which a maximum number A of call attempts is
to be made to sampled individuals.
HOW SHALL A BE CHOSEN?
Part of a larger problem of designing efficient
call scheduling algorithms.
Q2008
2
Prerequisites



(Single-occasion survey)
Direct sampling from a frame with good
population coverage
Estimation of a population total t  by the
direct weighting estimator
ˆt yA   r  A 
yk
 kˆ A 
Observed value for
individual k (proxy for
the true value µk)
ks
Response set after
A call attempts
Inclusion probability
for individual k
Q2008
Estimated response
probability for individual k
after A call attempts
3
The Survey as a Three-Stage Process



Stage 1: Sample selection
Stage 2: Contact and response Maximally A
call attempts are made. Individuals respond in
accordance with an unknown response
distribution.
Stage 3: Measurement Observed values are
related to the true values according to a
measurement error model.
Q2008
4
Response Model
  A
ks



The sample can be divided into Hs
response homogeneity groups (RHG)
such that, for all A, given the sample,
all individuals within the same group have the
same probability of responding
individuals respond independently of each other
individuals respond independently of each other
after different numbers of call attempts
Q2008
5
Measurement Error Model
For an individual k in RHG h, given the sample
and that the individual responds at call attempt a,
Indicates if individual k
responds at attempt a=ak
yk  k 
Random interviewer effect
with expectation 0 and
variance  b2a  b
A
 v k ,a (bi ( k ),a   k ,a )
a 1
True value for individual k
Q2008
Random response error
with expectation 0 and
variance  2a  
6
Bias and Variance

 H sBias if theS RHG



A

does
not
hold:

model
s



  k  n h  1   k   E hp 
sh
Sample
between

Acovariance


h 1

response
probabilities and
s

 values
 h weighted true
design
S  A 
 Hs

s h

A

B( tˆ y )  E p   nh  1

A
h 1
s

h




Average response probability
within RHG
The variance of tˆyA  is derived in the paper
Q2008
7
Cost Function
C  C 0  C1 A, n   C 2  A, n   C 3  A, n 
C  A, n 

A,n 
C

Hs n C
h 1 h start,h

A,n 
C

Hs C
A n  n a 1
h 1 contact,h a 1 h h

A,n 
C

Hs m  A C
h 1 h interview,h
1
2
3
Q2008


8
Optimum A for RHG h



Ah 
Ah 
Ah 
2
MSE( tˆ yh )  B ( tˆ yh )  V ( tˆ yh )
V ( Ah )
 B ( Ah ) 
Enh 
2
E nh 
Assume:
of the costs are allocated to RHG h
n
Enh 
C  C 0 
 EC  Ah Enh 
n
Q2008
9
Optimum A for RHG h: Result
The optimum number of call attempts for RHG h
is the number Ah that gives the lowest value on the
function
Enh 
B ( Ah )( C  C 0 )
 V ( Ah )EC ( Ah )
n
2
Q2008
10
Our Data
LFS data from March-Dec. 2007, supplemented
with:


Annual salary 2006 according to the Swedish
Tax Register (our y)
Process data from WinDati (WD)
.
Note: not all WD events are
call attempts
Q2008
11
Data Processing and Estimation



Each monthly sample viewed as a SRS
Parameter: t  = total annual salary 2006
Bias within RHG h and month l estimated by

A

tˆhl  tˆhl
.
Q2008
12
Relative Bias (%)
Relative Bias, Monthly Averages
6
5
4
3
2
1
0
-1
-2
-3
-4
-5
Male
Female
1
6
11
16
21
26
31
Number of WinDati events
Q2008
13
Measurement Error Model Parameters
Intraclass correlation, ICC (Biemer and Trewin,
1997):
2
b
y  2
S U   b2   2
 y = .002
.
2
S U = 55,267,619,616
 b  1; 0; 1
  0
 b    = 110,979,155
Q2008
14
No Bias, ICC = .002
No measurement
errors
Gamma_b = -1
Gamma_b = 0
Gamma_b = 1
1
6
11
16
21
26
31
Number of WinDati Events
Q2008
15
Bias, ICC = .002
No measurement
errors
Gamma_b = -1
Gamma_b = 0
Gamma_b = 1
1
6
11
16
21
26
31
Number of WinDati Events
Q2008
16
Tentative Results


Efficient planning requires high-quality data on
processes and costs
Perhaps the choice of A should be based on
variance rather than MSE
Q2008
17
Discussion and Future Work



Do the results hold for other study variables,
other survey settings?
Improved models for measurement errors,
response and costs?
Develop a planning tool?
Q2008
18
Thank you for your attention!
Annica Isaksson, [email protected]
Peter Lundquist, [email protected]
Q2008
19