Hidiroglou_Yung - American Statistical Association

Download Report

Transcript Hidiroglou_Yung - American Statistical Association

Treatment Of Unit Non-response
In Establishment Surveys
ICES –III: June 18 -21, 2007
M.A. Hidiroglou
Wesley Yung
Statistics Canada
Outline
1.
2.
3.
4.
5.
6.
7.
8.
9.
Why is it a problem?
Causes
Measurement
Follow-up
Score Function
Adjusting for nonresponse
Weight adjustment
Imputation
Summary
Why is it a Problem?
Bias
Non-respondents differ from respondents in
the characteristics measured
Sampling variance
Increased
Reduced effective sample size
Causes
Frame quality
Contact information
name, address, telephone number and fax number
Classification (industry/geography)
Over-coverage: sampled unit not in scope to the
survey - does not respond
Under coverage: units declared out-of-scope – not
contacted
Causes, cont.
Questionnaire
Design and layout
Coverage: complex businesses
Language
Length / time to fill out
Causes, cont.
Data collection method
Did not adjust to respondent’s preferred
contact mode
Mail, personal interview, telephone interview,
computer assisted interviewing, etc
Causes, cont.
Contact: Agency and respondent
Lack of communication and follow-up
Too much contact: editing checks
Timing
Best day and time
Fiscal year end
Causes, cont.
Contact: Agency and respondent
Data availability
Response load
Who else is asking?
Legal obligations for respondents and
statistical agency
Confidentiality protection
Measurement
Compile non-response rates
Refusals
Non-contact
Out-of-scope
Seasonality /death status (unknown)
Mail returns
Other reasons
Follow Up
Follow-up non-respondents
All and/or targeted sub-group
Effective way to increase the response rate
Follow Up, cont.
Prioritise follow-up
Who?
Target large or significant units first
Non-responding births
Delinquent businesses
How?
Score-function
Follow Up, cont.
Largest
Annual business census type
surveys
Split non-respondents by into
take-all and take-some strata

Boundary b  x  c Nx  S
2
2
2

Select with certainty ta units:
xk  b
Select n - ta remaining units
from take-some stratum

xk  bx  c 2 Nx 2  S 2
0.5

0.5
Follow-up
Response
NonResponse
Smallest
Follow Up, cont.
sr
sr
Hansen-Hurwitz (1946)
Initial sample: N  n
n  nr  nr
s1r
Follow-up sample of non-respondents
nr  n1r
Estimator

nr
N
ˆ
Y    s yi 
n  r
n1r

 s1r yi 

Score Function
Basic idea
Follow-up non-responding units that have most
impact on estimates
Adaptation of Latouche and Berthelot (1992),
McKenzie (2001), and Hedlin (2003).
Score Function, cont.
Key steps
1. Define and compute score function from
past values
2. Determine score cut-off: minimize absolute
standard bias
3. Follow-up units above score cut-off
Score Function, cont.
1. Define and compute score function
 Use past data at time t past (say)
 sRESP (t past ) respond:
y j  t past 

 Sample: s(t prev ) 
imp
 sRESP (t past ) do not respond: y j  t past 
 Follow-up everybody: YˆsRESPF
( t past ) 

js ( t past )
w j  t past  y j  t past 
 Compute score function using non-responding units: sRESP (t past )
w j  t past  y j  t past   y imp
j  t past 
score j (t past ) 
*100
FULL
ˆ
Y
s ( t past )
Score Function, cont.
2. Determine score cut-off
 Rank scores score j (t past ) from highest to lowest

 Follow-up highest B scores 1  B  nsRESP
- response set

: sRESPB (t past )
- non-response set : sRESPB (t past )
 New estimate: YˆsRESPB
( t past ) 

jsRESPB
w j  t past  y j  t past  

sRESPB
w j  t past  y imp
j  t past 
Score Function, cont.
2. Determine score cut-off
 Absolute standard bias: ASB (B)=
ˆ RESPB
YˆsRESPF

Y
( t past )
s ( t past )

s.e. YˆsRESPF
( t past )

 Score cut-off: scoreCUT (t past ) where ASB (CUT )  A
 Reasonable value for A=0.10
 If cv=2%, then ASB(CUT )=0.2%
3. Follow-up units above score cut-off
Score Function, cont.
Score-function (Latouche and Berthelot 1992)
Q
score k  t  = 
q =1
wk  t  I q xkimp
,q  t  - xk,q  t  1

s ( t 1 )
wk  t  xk,q  t  1
wk  t   survey weight at time t
I q  importance of variable q
Establish threshold based on ASB
Follow-up k-th unit if
scorek  t   threshold
Score Function, cont.
Absolute
standard
bias
Cut-off
0
Number of recontacts
Weight Adjustment, cont.
Select sample s: Design weights
wi  s  n 
Portion of sampled units that respond:
sr (nr )
Portion of sampled units that does not respond: sr (nr )
sr
sr
s1r
Adjusting for nonresponse
Two options
1. Weight adjustment:
Inverse of response probability
Use of auxiliary data
2. Imputation:
Impute for missing values to get a full data matrix
Weight Adjustment
Used to reduce bias due to non-response
Depends on the probability to respond  i
Assumes  i independent of variable of interest, y
Ignorable non-response
Respondents behave same as non-respondents
Weight Adjustment, cont.
If  i known, then adjustment is 1/ i
Unbiased estimator is
1
ˆ
Y   wi yi
i
sr
However,  i not known
Use estimates of ˆi: may be biased
If ˆi are ‘good’, then estimates are
approximately unbiased
Weight Adjustment, cont.
Let true response mechanism be
Pr  k  sr s    k
and
Pr  k ,  sr s    k 
k  
If assume missing at random:
Bias for estimated total: Yˆ  N  1/  k 
Yˆ


N   k 
U

1

sr
y  y
k
U
k
k
U

1
y
sr
k
/k
Weight Adjustment, cont.
How to estimate (approximate) ˆi ?
Auxiliary variables
Logistic regression
Auxiliary data (discrete, continuous)
Weight Adjustment, cont.
Logistic regression
Define indicator response variable
1 if unit i responds
i  
0 otherwise
Probability that unit k responds
i  Pr  i  1 ziβ   1  exp   ziβ  
Equivalent to:
 
ln  i
 1  i
zi  1, zi1 ,
1

  ziβ

, zip  ; auxiliary data
β  a vector of logistic regression coefficients
Weight Adjustment, cont.
Logistic regression
Solve
 w 1  e
isr
i
zi βˆ


isr
ˆ
Response probability adjusted weight

zi βˆ
ˆ
ˆ
wi  wi / i where i  1  e
Reweighed estimator:
YˆLR   s wi yi
r

zi β

zi zi   wi 1  e
zi  i

Weight Adjustment, cont.
Example: Logistic regression
Probability of Response
Response status
Theta hat
Mean theta hat
1.2
1
Theta Hat
127 sampled
businesses
71 businesses
respond
Same  : 0.56
0.8
0.6
0.4
0.2
0
40
50
60
x-values
70
80
Weight Adjustment, cont.
Example Logistic regression
Response
71 Respond
55 Respond
1
x- values
76
72
68
51
46
46
46
45
44
44
44
43
43
42
42
42
42
41
0
40
Response
2
Weight Adjustment, cont.
Example: Logistic regression
Probability of Response
Response status
Theta hat
Mean theta hat
1,2
1
Theta Hat
127 sampled
businesses
55 businesses
respond
Same  : 0.43
0,8
0,6
0,4
0,2
0
40
50
60
x-values
70
80
Weight Adjustment, cont.
Discrete (Count Adjustment)
Assume that  i   and  ij   i j for all i and j
That is, everyone has the same probability of
response and the probability of response is
independent between individuals (Uniform
Response Mechanism)
Estimate of  is
ˆ   wi
sr
w
i
s
Weight Adjustment, cont.
Discrete (Count Adjustment)
Non-response adjustment is


  wi 
 s



  wi 
 sr

Non-response adjusted estimator is


ˆ
Y    wi yi    wi
 s
 s

wi 

sr

Weight Adjustment, cont.
Continuous (Auxiliary Data)
Suppose we have auxiliary data xi and the
known population total X
Estimate  by either
ˆ1   wi xi
sr
w x
i i
s
or ˆ2   wi xi X
sr
Under a Uniform Response Mechanism
(URM), ̂1 and ̂2 provide approximately
unbiased estimates
Weight Adjustment, cont.
Continuous (Auxiliary Data)
Note that ̂1 leads to a two-phase estimator
and ̂ 2 to the well known ratio estimator
̂ 2 calibrates to the known total X
Weight Adjustment, cont.
Continuous (Auxiliary Data)
If we have marginal totals for 2 auxiliary
variables, X and Z, one can use raking
M
F
15-30
?
?
Z1
30-65
?
?
Z2
65+
?
?
Z3
X1
X2
Weight Adjustment, cont.
Continuous (Auxiliary Data)
Raking assumes that  ijk   jk and  jk   j  k
Raking is an iterative procedure
Rake to one margin then the other
At convergence, get adjustment so that marginal
totals are met
Weight Adjustment, cont.
Continuous (Auxiliary Data)
Generalized Regression (GREG) estimator
Weight adjustment not really an estimate of
response probability
Can show that bias is function of response
probability and predictive power of X
Unbiased under URM
Weight Adjustment, cont.
Continuous (Auxiliary Data)
Weight adjustment
1



ai  1  X  Xˆ r   wi xi xi  xi
sr


ˆ


X r   wi xi
sr
Adjusted estimator:
Ŷ   wi ai yi
sr
Weight Adjustment, cont.
Weighting Classes
Assumption of URM very strong and
somewhat unrealistic
Usually define weighting classes
Mutually exclusive and exhaustive groups C1, C2,
…,CR
Assume URM within each class
Weight Adjustment, cont.
How to define weighting classes?
Using auxiliary data to group units so that
within the weighting class i   r
Using auxiliary data and logistic regression
models
Obtain ˆi for all i
Form groups so that ˆi   r
Weight Adjustment, cont.
Weighting Classes
If weighting class variable is good at predicting y
and non-response, bias and variance will be
reduced
If weighting class variable unrelated to nonresponse but is good predictor of y, no bias
reduction but variance reduced
If weighting class variable unrelated to y, no bias
reduction. Variance could increase if weighting
class variable good predictor of non-response!
Imputation
Usually used for item non-response
Can be used for unit non-response also
Several methods available
Deductive imputation
Class mean imputation
Cold-deck imputation (earlier survey/ historical)
Imputation
Hot-deck imputation (current survey)
Random overall imputation
Random imputation classes
Sequential hot deck
Distance function matching
Regression imputation
Simplest example is ratio
Imputation, cont.
For business surveys, most commonly used
methods involve auxiliary data
Historical data
If data available from previous time period, use it with a trend
(last month / last year)
If none available, use a mean imputation
Administrative data (i.e. tax)
Use tax data with or without an adjustment
At Statistics Canada, annual tax data used to directly replace
and monthly tax data adjusted before use
Summary
Reduce non-response at front-end
Frame
Contact vehicle
Editing
Measure non-response
Follow-up selectively and representatively
Adjust for non-response
Model (Weighting /imputing / Logistic Regression)
Homogeneous classes
References
Bethlehem, J.G. (1988) reduction of Nonresponse bias through regression estimation. Journal of Official Statistics, Vol. 4,
No. 3, 251-260.
Cochran, W.G. (1977): Sampling Techniques. Third Edition, Wiley, New York.
Cornish J. (2004). Response Problems In Surveys: improving response and minimising the load for UNSD. Regional
Seminar on 'Good Practices in the Organization and Management of Statistical Systems’ for ASEAN countries,
Yangon Myanmar, 11-13 December 2002.
DeLeeuw, Edith D (ed) (1999). Special issues on Survey Nonresponse Journal of Official Statistics 15, 2.
Dillman, D. A. Procedures for Conducting Government-Sponsored Establishment Surveys: Comparisons of the Total
Design Method (TDM), a Traditional Cost- Compensation Model, and Tailored Design, Washington State
University.
Ekholm, A. and Laaksonen, S. (1991). Weighting via Response Modeling in the Finnish Household Budget Survey.
Journal of Official Statistics, 7, 325–337.
Ekholm, A. and Laaksonen, S. (1991). Weighting via Response Modeling in the Finnish Household Budget Survey.
Journal of Official Statistics, 7, 325–337.
Elliot, M.R., Little, R.J.A., and Lewitzky, S. (2000). Subsampling Callbacks to Improve Survey Efficiency. Journal of the
American Statistical Association, 95, 730–838.
Groves R M, Dillman D A, Eltinge J L & Little R J A (eds), Survey Nonresponse, 2002, Chichester: Wiley
Hansen, M. H., and Hurwitz, W. N. (1946), The Problem of Nonresponse in Sample Surveys, Journal of the American
Statistical Association, 41, 517–529.
Hedlin, D. (2003).Score Functions to Reduce Business Survey Editing at the U.K. Office for National Statistics . Journal of
Official Statistics, Vol.19, No.2, 177-199
Hidiroglou, M. A, Drew, D. J, and Gray, G. B, June 1993 A frameworkfor Measuring and Reducing Nonresponse in
Surveys, Survey Methodology 19:81-94
International Conference on Survey Nonresponse (1999). http://jpsm.umd.edu/icsn/papers/Index.htm.
Kalton G. and Flores-Cervantes I. (2003). Weighting Methods. Journal of Official Statistics, Vol.19, No.2, 2003. pp. 81-97
References
Laaksonen, S. and Chambers, R. (2006). Survey Estimation under Informative Nonresponse with Follow-up. Journal of
Official Statistics, Vol. 22, No. 1, 2006, 81–95.
Latouche, M. and Berthelot, J.-M., (1992). Use of a Score Function to Prioritize and Limit Recontacts in Editing Business
Surveys. Journal of Official Statistics, Vol.8, No.3, 1992. 389-400.
Lawrence D. and McKenzie R. (2000).The General Application of Significance Editing . Journal of Official Statistics,
Vol.16, No.3, 243-253
Little, R. (1986). Survey Nonresponse Adjustments for Estimates of Means. International Statistical Review, 54, 139–157.
Lundstrom Sixten and Särndal C.-E. (1999). Calibration as a Standard Method for Treatment of Nonresponse. Journal of
Official Statistics, Vol. 15, No. 2, 1999, 305-327.
Lynn, Peter and Clarke, Paul, Separating refusal bias and con-contact bias: evidence from UK national surveys, The
Statistician, 51, Part 3, 391-333.
Madow, W.G., Nisselson, H., and Olkin, I. (eds.) (1983): Incomplete Data in Sample Surveys. Vol. 1: Report and Case
Studies. Academic Press, New York.
McKenzie, Richard. (2000). A Framework for Priority Contact of Non Respondents. In the Proceedings of The Second
International Conference on Establishment Surveys, Buffalo, New York. 473 - 482.
Rao, J.N.K.(1973 ).Double sampling for stratification and survey.Biometrika ,Vol. 60, No. 1 : 125-133
Särndal, C.-E. and Swensson, B. (1987). A General View of Estimation for Two Phases of Selection with Applications to
Two-Phase Sampling and Nonresponse. International Statistical Review, 55, 279–294.
Strauss, E.E., and Hidiroglou, M.A. (1984). A Follow-up Procedure for Business Census Type Surveys. In Topics in
Applied Statistics. Y.P. Chaubey and T.D. Dwivedi ed., 447-453. Published by Concordia University, Montréal.
Valliant R. (2004) The Effect of Multiple Weighting Steps on Variance Estimation Journal of Official Statistics, Vol.20,
No.1, 1-18.
Wang, J.E. (2004). Non-response in the Norwegian Business Tendency Survey. Statistics Norway Department of
Economic Statistics.
Score Function, cont
No follow-up on occasion t-a
YˆNO _ FU  t - a  

jRESP
w j t  a  y j t - a  

1
w j  t  a  y IMP
t - a 
j
jRESP
Partial follow-up on occasion t-a
YˆPART _ FU  t - a  

jRESP


wj t  a  y j t - a  
 w t  a  y t - a 
jFU
j
2
w j  t  a  y IMP
t - a 
j
jRESP 2
Full follow-up on occasion t-a
YˆFULL _ FU  t - a  

jRESPFULL
wj t  a  y j t - a 
j