Dealing with Item Non-response in a Catering Survey

Download Report

Transcript Dealing with Item Non-response in a Catering Survey

Dealing with Item Non-response
in a Catering Survey
Pauli Ollila
Statistics Finland
Kaija Saarni
Finnish Game and Fisheries Research Institute
Asmo Honkanen
Finnish Game and Fisheries Research Institute
1
The Finnish Catering Survey
• Studying the use of fish, crawfish, red
deer, elk and reindeer in the catering
sector during year 2005.
• Carried out by Finnish Game and
Fisheries Research Institute together
with the interview organisation of
Statistics Finland
• Computer assisted telephone interviews
at the beginning of 2006
• Population 14740, sample size 2263,
stratification by “portion classes” (7).
• Respondents 1741, unit non-response
498, over-coverage 24
2
Information on amounts
• The questionnaire was divided
into three sections for fish, crab
and game (red deer, elk, reindeer)
• Among other questions every
section included questions
requiring amounts in kilograms,
both in totals and in categories
(type of product, species) and
origin (domestic/imported)
• The amounts in categories could
be defined in percentages as well
3
EXAMPLE: Question 8a
What was the total amount of fish as raw material you used in 2005
________________ kg
MUISTIO
Yksikkö
Nimi
Yhteystiedot
1(1)
30.5.2007
Furthermore, estimate the form in which the fish as raw material was delivered to you? (If you
cannot estimate the distribution with kilograms, estimate the proportion of the total in percents)
kg /
year
%
1. Fresh whole/gutted
2. Fresh fillet
3. Frozen whole/gutted
4. Frozen fillet
5. Other frozen products
6. Prepared
7. Canned
8. Salted or spiced
9. Smoked
10. In other form
100 %
4
The quality of response
• It was obvious that some respondents could not provide full
and exact information for these questions due to various
reasons.
• For example, the amounts given in classifying questions
were contradictory to the overall questions. Further, the
questions for domestic and foreign fish were providing
different results than the overall fish consumption question.
• A lot of editing work was carried out in the Finnish Game
and Fisheries Research Institute in order to get the data
cleaner (e.g. functional deduction between questions) and to
convert the percentage information into kilograms.
5
• Still some contradictory and insufficient responses,
which couldn’t be solved, were left for statistical
processing.
• For example, regarding total kilograms and sum of
kilograms of categories we had:
MUISTIO
1(1)
30.5.2007
Yksikkö
Nimi
Yhteystiedot
sums no total / categ.
ok
zero total missing
Fish
Foreign fish
Crawfish
Red deer, elk,
reindeer
1270
1059
79
172
26
93
1646
1560
237
344
14
8
categ.sum categ.sum
more
less
96
92
2
1
112
153
0
0
all
1741
1741
1741
1741
NOTE: Less than 10 % difference in total kilograms and sum of
kilograms was allowed in the interview situation.
6
Item non-response
• The most usual case of item non-response: the
category kilograms are totally missing when the
overall total exists.
• The sum of the existing category kilograms may
either exceed or go below the overall total given in
the response.
• In principle the latter alternative can be considered as
item non-response.
• However, it is not clear how many categories are
under item non-response or whether the existing
category sums are simply erroneous for some part.
7
How to correct?
• How to treat full missingness of the category sums?
• How to deal with category sums not matching the
overall sum (mismatch sums)?
Alternatives for dealing with the problems
•
•
•
•
Donor imputation
Mean imputation
Regression imputation
Weight adjustments
The method in the final statistical processing was chosen from
these alternatives considered in the following form:
8
Corrections considered: donor imputation
Full missingness of the category sums
- Selecting a donor within a stratum (“portion category”),
applying its percentages for creating the imputed values as
proportions from the overall total.
- Nearest neighbour class criterion by “number of kitchen staff”,
“number of days serving fish”.
Mismatch sums
- For the cases of category sums lower than the overall sum it is
hard to apply imputation, there is no information of which
category/categories should get the imputation values, and the
mismatch may still continue. For the opposite cases imputation
is not applicable.
- In order to retain distribution information on categories, the
relations are proportioned up or down with a ratio
ri  yoverall,i
y
category
category,i
9
Corrections considered: group mean imputation
Full missingness of the category sums
- Using group means of percentages for every amount category.
“Portion categories” and “number of days serving fish” used as
groups.
Mismatch sums (as in donor imputation)
Corrections considered: regression imputation
Full missingness of the category sums
- Using modelling for percentages in categories, various
auxiliary variables tried, e.g. “number of kitchen staff”, “number
of days serving fish” separately for “portion categories” (only
for those kitchens, who have served fish). No better explanatory
variables were available for all observations.
Mismatch sums (as in donor imputation)
10
Corrections considered: weight adjustments
Full missingness of the category sums & mismatch sums
- Correcting the category results by adjusting the
weight separately for the different questions including
amounts with a ratio
w y
is
i
overall,i


  wi  ycategory,i 


 is category

i.e. the weighted overall total sum divided by the
weighted sum of the category sums.
- Separate weights cause inconsistencies when
comparing statistics based on variables with no item
non-response made either with normal weighting or
adjusted weighting. Also practical problems in
tabulations and analysis may occur.
11
Actions at that time
• Due to the lack of time at the estimation phase the
weight adjustments were chosen. ==> conservative
and quick solution => all the information on amounts
were in line with each other (some kind of calibration).
• The purposes of the catering survey were purely
descriptive, and studies were made only at the general
level and some simple classes (e.g. region).
• Complex cross-tabulations and analysis were not
conducted.
WHAT DID THE SUBSEQUENT TESTS WITH THE
CORRECTION ALTERNATIVES REVEAL?
12
Subsequent test experiences
• Inflating item non-response factor in weight adjustments varying
from 1.00689 to 1.47618
• Practical choice: mean and regression imputation conducted for
others than the biggest class, which had the value 100 % - sum
of other percent estimates. This ensured the situation that the
sum of other percent estimates was not exceeding 100 %.
• The regression estimation performed so poorly (e.g. negative
percentage values) that it was not considered further
• Only weight adjustment replicates the original distribution of the
classification amounts
• The standard deviations are affected in all methods
13
The inconsistency problem with weight
adjustments (example: proportion classes)
MUISTIO
30.5.2007
Yksikkö
Nimi
Yhteystiedot
1-49
Original
1(1)
4593
31.16
Fish
4845
32.22
Imported fish 5147
33.15
Species of
5089
foreign fish
32.63
Domestic fish 5969
33.04
50-99 100199
3522
23.89
3716
24.71
3847
24.77
3820
24.49
4431
24.52
2986
20.26
2858
19.00
2931
18.87
3020
19.37
3473
19.22
200- 500- 1000 all
499 999 2317
15.72
2268
15.08
2261
14.56
2288
14.67
2670
14.78
923
6.26
926
6.16
920
5.93
954
6.12
1052
5.82
399
2.71
427
2.84
422
2.72
425
2.73
473
2.62
14740
15039
15529
15596
18069
totals rounded to integers
14
MUISTIO
The distribution problem
Yksikkö
Nimi
Yhteystiedot
1(1)
30.5.2007
(example: species of fish, overall total 14036226)
no
correction
Salmon
Rainbow trout
Baltic
herring
European
whitefish
Pikeperch
Vendace
Perch
Herring
Cod and other
whitefish
Tuna
Other
weight
adjustment
donor
imputation
group mean
imputation
1777326 15.19
3348190 28.61
936373 8.00
2131542 15.19
4015476 28.61
1122990 8.00
2357228 16.76
3778266 26.86
1119551 7.96
2086781 14.87
3905681 27.83
1144067 8.15
289291 2.47
346946 2.47
338279 2.40
335130 2.39
282990 2.42
184875 1.58
143687 1.23
208967 1.79
2989394 25.54
339389 2.42
221720 1.58
172324 1.23
250613 1.79
3585172 25.54
338290 2.40
214977 1.53
164907 1.17
242182 1.72
3688276 26.22
318147 2.27
217264 1.55
161378 1.15
245398 1.74
3820289 27.22
1224018 10.46 1467962 10.46 1445786 10.28 1429165 10.18
318597 2.72
382093 2.72
348485 2.48
372927 2.66
11703710 100.0 14036226 100.0 14036226 100.0 14036226 100.0
15
Weighted standard deviation changes
(example: species of fish)
MUISTIO
30.5.2007
Yksikkö
Nimi
Yhteystiedot
Salmon
Rainbow trout
Baltic herring
European
whitefish
Pikeperch
Vendace
Perch
Herring
Cod and other
whitefish
Tuna
Other
1(1)
respondents weight
donor
regression
without
adjustments imputation imputation
correction
1220
1336
1673
1293
5324
5830
5346
5348
630
690
700
695
278
305
305
288
399
203
181
229
2103
437
222
198
251
2304
424
214
192
241
929
409
207
182
236
2305
867
524
950
574
528
73
889
528
16
Conclusions
• The inconsistency level of the weight adjustment method
was not serious
• Both donor and mean imputation had a slight effect to the
distribution of amounts, but not remarkable
• It is clear that the weighted standard deviations were
inflated by the weight adjustments, but donor imputation
tended to have more varying standard deviation figures
between amount categories. As expected, mean imputation
had a diminishing effect on variation.
• Current recommendation: Banff package for statistical
editing and imputation (by Statistics Canada, constructed
in SAS environment)
17