4. Application and results (cont`d)

Download Report

Transcript 4. Application and results (cont`d)

Optimizing CATI Call Scheduling
International Total Survey Error
Workshop
Hidiroglou, M.A.,
with Choudhry, G.H., Laflamme, F.
Statistics Canada
1
Statistics Canada • Statistique Canada
Outline
1. Introduction
2. Current CATI workload allocation
3. Optimizing CATI workload
4. Application and results
5. Conclusions
6. Future Work
2
Statistics Canada • Statistique Canada
1. Introduction
Background
• Paradata research focussed at improving the current data collection
process and practices for CATI surveys
• Identified several opportunities for improvement
• Implemented some of them (e.g. responsive collection design, time
slice, cap on calls, etc.)
 Active management used to monitor data collection suggested that
resources were not always optimally used throughout collection period
3
Statistics Canada • Statistique Canada
1. Introduction (cont’d)
Paradata Sources
 Blaise Transaction History (BTH) file: Record created each time an open
case is closed
- Survey, cycle, Regional Office (RO), interviewer ID
- Date, start time and end times of the call
- Duration of the call and associated time slice (morning, afternoon, early and
late evening shifts)
- Outcome code (e.g. complete, appointment, no contact)
 Interviewer payroll information: data collection costs
- Total payroll hours represents the hours charged to the survey
 Historical information since 2003 for all surveys
 Updated on a daily basis
4
Statistics Canada • Statistique Canada
1. Introduction (cont’d)
Lessons Learned
• Significant effort (calls and time spent) on cases for which an
interview is not conducted at the first contact
• Substantive efforts spent close to the end of data collection yield
relatively small marginal returns
• Interviewer staffing levels not always optimally allocated with respect
to workload and expected productivity
 Develop a draft framework to improve the cost-efficiency of data
collection
5
Statistics Canada • Statistique Canada
2. Current CATI workload allocation
 Many input sources (Excel spreadsheets) used for scheduling interviewers:
whole operation is manual
 Annual workload: Annual staffing needs are planned for each regional
office according to its capacity and the number of interviewers required for
all surveys
 Monthly scheduling: Determine number of hours for each interviewer
according to
•
•
•
•
6
type of survey (e.g.: social, business or agricultural)
duration of survey
current period (end of survey, Labour Force Survey week , etc..)
number of interviewers required per day
Statistics Canada • Statistique Canada
2. Current CATI workload allocation (cont’d)
 Constraints for each interviewer
•
•
•
•
Time of day: week-day and weekend
Interviewer training
Vacation , sick leave; preferred working hours
Union constraints: minimum and maximum work hours per week
 Monthly spreadsheet updated on a daily basis
7
Statistics Canada • Statistique Canada
3. Optimizing CATI workload
 Use existing BTH record for given survey
• Compute probability of completing a questionnaire for each time slice
• Predict these probabilities using linear or logistic regression
 Output estimated model parameters
• Estimated parameters from linear or logistic regression model
• Predicted probabilities of completing a questionnaire
 Input predicted probabilities into optimization
• Determine optimal number of calls by time slice subject to cost and / or
operational constraints
8
Statistics Canada • Statistique Canada
3. Optimizing CATI workload (cont’d)
 Regression Models
• Linear:
• Logistic:
n
ps   0   i xis + Cs +es for s  1,
 ps
ln 
 1  ps
,S
i 1
3

*
*
*
  0   t xts + Cs for s  1,
t 1

,S
• Cs = Average cumulative number of calls up to and
including time slice s
• xts (time of call: morning, afternoon, early and late evening
• p s = probability of a completed questionnaire within time
slice s
9
Statistics Canada • Statistique Canada
3. Optimizing CATI workload (cont’d)
 Optimization
S
 Total data collection cost: g (c )   t1 ps cs  t2 1  ps  cs 
s 1
• t1 and t2: unit costs for productive / non-productive calls
• p s : predicted probabilities from the model
• Call vector
c   c1 , c2 ,..., cS  minimizes g subject to
• Number of calls cs for each time slice s is greater than or equal to 0
• Expected response rate
S
pc
s 1
s s
/ n equal to a pre-specified response
rate R.
10
Statistics Canada • Statistique Canada
4. Application and results
 Survey of Labour and Income Dynamics (SLID)
 Longitudinal survey, interviews the same people from one year to the next
for six consecutive years
• Sample size ~ 34,000 ;
• Number of calls ~ 400,000
• Response rate: approximately 71%
 Measures: Employment and Unemployment Dynamics; Life Cycle Labour
Market Transitions; Job Quality; Family Economic Mobility; Dynamics of
Low Income; Life Events and Family Changes
 Data obtained via CATI over 28 collection days (112 time slices)
11
Statistics Canada • Statistique Canada
4. Application and results (cont’d)
Prob of completing questionnair
0.400
0.350
0.300
0.250
0.200
0.150
0.100
0.050
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
Time slice
12
Statistics Canada • Statistique Canada
4. Application and results (cont’d)
 Regression
• Probability of completing questionnaire decreases over time
• Intercept and continuous variable significant for all regional
offices and regression type (linear /logistic)
• Logistic regression provides the better fit: 0  ps  1
• Best time period for calls depends on regional office:
o Late evening
o Morning, early and late evening
o All time periods good
13
4. Application and results (cont’d)
Table 1: Average Absolute relative deviation ( ARD)
defined as average of ARDs  1  ps / ps
over times slices with 50 or more calls
RO
Edmonton
Halifax
Sherbrooke
Sturgeon Falls
Toronto
Winnipeg
14
Linear Regression
34.5
28.2
24.0
27.1
28.5
26.7
Statistics Canada • Statistique Canada
Logistic Regression
23.6
25.8
23.3
22.7
24.9
26.7
4. Application and results (cont’d)
 Number of calls made- Edmonton
40000
30000
Actual calls
Optimum - Linear Regression
Optimum - Logistic Regression
25000
20000
15000
10000
5000
0
000
004
008
012
016
020
024
028
032
036
040
044
048
052
056
060
064
068
072
076
080
084
088
092
096
100
104
108
112
Cumulative number of calls
35000
Time slice
4. Application and results (cont’d)
 Time spent (minutes) -Edmonton
180000
140000
Actual calls
Optimum - Linear Regression
Optimum - Logistic Regression
120000
100000
80000
60000
40000
20000
0
000
004
008
012
016
020
024
028
032
036
040
044
048
052
056
060
064
068
072
076
080
084
088
092
096
100
104
108
112
Cumulative time spent (minutes)
160000
Time slice
4. Application and results (cont’d)
 Number of questionnaires completed -Edmonton
3000
2500
Actual calls
Optimum - Linear Regression
Optimum - Logistic Regression
2000
1500
1000
500
0
000
004
008
012
016
020
024
028
032
036
040
044
048
052
056
060
064
068
072
076
080
084
088
092
096
100
104
108
112
Cumulative questionnaires completed
3500
Time slice
4. Application and results (cont’d)
 Optimization
Table 2: Savings in terms of time spent
RO
Edmonton
Halifax
Sherbrooke
Sturgeon Falls
Toronto
Winnipeg
Canada
18
Linear
Regression
11.2
5.9
22.6
17.6
17.2
0.3
12.9
Statistics Canada • Statistique Canada
Logistic
Regression
17.2
11.5
23.3
17.2
14.0
0.1
14.9
5. Conclusions
 Model predicted quite well the probabilities for
completed questionnaires by time slice
 Using the predicted probabilities, the optimum
schedule showed cost savings without considering the
operational constraints
 Additional constraints will reduce cost savings
5. Conclusions (cont’d)
 Workload should be spread uniformly throughout the collection
period
 Optimum number of calls may vary by time of day
 Logistic model slightly better than linear model
20
Statistics Canada • Statistique Canada
6. Future Work (cont’d)
 Simulate and optimize collection with multiple surveys and
interviewers carried out concurrently
 Individual assignment of interviewers to time shifts within the
day needs to reflect:
• Legal, ergonomic, and operating constraints
• Minimum and maximum number of days that interviewers work within
the week
• Shift duration per day (no more or less than a fixed number of hours),
including starting time range of each interviewer
• Number of shifts within a day should be reasonable
21
Statistics Canada • Statistique Canada
6. Future Work
 How do we do above?
• Extend optimization to include mix of interviewers and
surveys
• Translate the number of calls within each shift and survey into
number of required interviewers
• Use commercial software such as XIMES to account for
constraints, and schedule each interviewer by time shift
(Gartner, Musliu, and Slany 2001)
Gartner, J. Musliu, N., and Slany W. (2001). Rota: a research
project on algorithms for workforce scheduling and shift
design optimization, AI Communications, 14, 83-92
For more information, please contact
 Mike Hidiroglou
• [email protected]
 G. Hussain Choudhry
• [email protected]
 François Laflamme
• [email protected]
23
Statistics Canada • Statistique Canada