Estimating the Level of Underreporting of Expenditures
Download
Report
Transcript Estimating the Level of Underreporting of Expenditures
Estimating the Level of Underreporting of
Expenditures among Expenditure Reporters:
A Further Micro-Level Latent Class Analysis
Clyde Tucker
Bureau of Labor Statistics
Paul P. Biemer
Research Triangle Institute
Brian Meekins
Bureau of Labor Statistics
DISCLAIMER: The views expressed in this presentation are those of the authors and do not
necessarily represent the views of the Bureau of Labor Statistics or the Department of Labor.
Outline
•
•
•
•
•
•
•
Background
Research Goals
Past Approach
Past Results
Current Approach
Results
Future Research
U.S. Consumer Expenditure (CE)
Interview Survey
• ~ 6,000 households/year
– Interviewed every 3 months about prior 3
months expenditures
– 5 consecutive interviews for each
household
• 6 years of CE data: 1996 – 2002
Research Goals
This multi-year project has three goals:
1. Identify patterns of underreporting of
expenditures in different commodities
2. Identify the characteristics of respondents
contributing most to the underreports
3. Use the knowledge gained to design new
procedures for overcoming
underreporting
Phases of Research
• Phase 1 (2003)
– Markov LCA on macro level data
– Non-reporters only
• Phase 2 (2004)
– (Ordered) LCA on micro level data for total consumer
expenditures
– Reporters only
• Phase 2 (current)
– (Ordered) LCA on micro level data for separate commodity
expenditures
– Examination of possible casual linkages to respondent
characteristics
– Reporters only
• Phase 3 (future)
– Combine the macro and micro analyses
– All sample
– Produce overall estimates of underreporting by category and
respondent characteristics
Phase 1: Approach
• Used 4 consecutive CE interviews
“Since the 1st of (month, 3 months ago), have
you (or any members of your household) had
any expenses for __________?”
• Used 1st and 2nd order Markov LCA to fit models
to dichotomous response to screening question
• Explored effect on underreporting of:
– family size, income, age, family type, gender,
education, record use, interview length
Phase 1: Design
• Obtained estimates of false negative probability
i.e. P(no purchase reported | made a purchase)
• Produced estimates for each commodity of:
–True proportion of “purchasers”
– Accuracy rate
i.e. P(report a purchase | truly made a purchase)
• Used these estimates to examine relationships
between demographic variables and probability of
accurate reporting
Phase 1: Conclusions
• Model fit was adequate for all commodities
• Levels of underreporting vary by commodity
• Variables were found to be positively related to
accurate reporting included:
• Education
• Family Size
• Income
• Use of Records
• Length of Interview
• The effect of age was highly variable
Phase 2 in 2004
• Differences between Phase 1 and Phase 2:
– Used only Interview 2 data, not Markov LCA
– Micro level analysis
– Reporters only
– Latent variable represents level of
underreporting, as opposed to purchasing
status as in Phase 1
Approach
• Analysis Plan
– Ran both ordered latent class models and
unordered.
• Order was determined based on
theoretical relationship between values of
indicators and level of underreporting.
– Ran all combinations of indicators in groups
of 3
• Using only reporters
• Using only 2nd interview data
Application of Model
• For the final model:
– Each combination of indicator was assigned
to a latent class
– The probability of being in that class given the
value of the indicators was used to assign
classes
– Each respondent was assigned to a latent
class given the value of their indicator
variables
– Expenditure means were found for each latent
class.
Summary of Findings in 2004
•
•
Levels of underreporting were found to vary by
interview level characteristics including:
1. Number of contacts
2. Missing income data
3. Type and frequency of records used
4. Length of interview
Total expenditure means for respondents
assigned to each latent class confirmed this
Current Phase 2
• Using same general methodology as 2004
• Refine indicators
• Apply methodology to separate commodity
categories
• Identify best model for each commodity and
assign respondents to latent classes
• Examine the pattern of mean expenditures for
each latent class to confirm results
• Run demographic analysis to identify
characteristics of members of each latent class
Indicators
• Interview level indicators considered:
1.
2.
3.
4.
5.
6.
7.
Number of contacts
Ratio of respondents/household members
Missing income data
Type and frequency of records used
Length of interview
Ratio of expenditures in last month to quarter
Combination of type of record and interview
length
Indicator Coding
•
•
•
•
#contacts (1=0-2; 2=3-5; 3=6+)
Resp/hh size (1= <.5; 2= .5+)
Income missing (1=present; 2=missing)
Records use (1=never; 2=single type or
sometimes; 3=multiple types and always)
• Interview length (1= <45; 2=45-90; 3= 90+)
• Month3 expn/all (1= <.25; 2= .25-.5; 3= +.5)
• Combined records and length (1= poor; 2= fair;
3=good)
Demographic Coding
•
•
•
•
CU size (1=1; 2=2; 3=3+)
Age (1= 30<; 2= 30-49; 3=50+)
Education (1=< H.S.; 2= H.S.+)
Income rank (1= <=.25; 2=.25-.75 and
missing; 3=+.75)
• Race (1= White; 2= Other)
• Tenure (1= renter; 2= owner)
• Urban (1= urban; 2= rural)
LCM Fit by Commodity
BIC
Kid’s Clothing
Diss Index
-7.4029
0.0040
-23.1575
0.0221
Men’s Clothing
-244.0034
0.0258
Furniture
-239.6923
0.0327
Electricity
-8.6450
0.0021
Minor Vehicle
Repairs
-221.5239
0.0306
Kitchen
Accessories
-119.7008
0.0288
Women’s Clothing
Expenditure Means by Latent Class
Value of Latent Class
1 = Poor
Kid’s Clothing
Mean
n
Women’s
Clothing
Mean
Men’s Clothing
Mean
n
n
Furniture
Mean
n
Electricity
Mean
n
Minor Vehicle
Expenditures
Mean
Kitchen
Accessories
Mean
n
n
2
3 = Good
44.90(a)
59.62(b)
71.09(c)
24,666
12,001
6,331
99.00(b)
148.08(a)
152.94(a)
21,316
11,281
10,401
78.98(b)
107.04(a)
105.46(a)
36,080
842
6,076
117.25(a)
66.22(b)
266.63(c)
23,437
16,315
3,246
230.47(a)
198.87(b)
223.30(c)
32,905
4,377
5,716
39.47(a)
57.03(b)
82.28(c)
9,864
26,288
6,846
23.27(b)
52.51(a)
47.58(a)
26,589
2,934
13,.475
Proportional Odds Model Results for Minor Vehicle
PR(X2)
Exp(b)
Famsize 1
.801
<.0001
Famsize 2
1.063
<.0001
Age 1
.862
<.0001
Age 2
1.033
.0293
Educ
.984
.1492
Inclass 1
.776
<.0001
Inclass 2
.762
<.0001
1.056
<.0001
Tenure
.909
<.0001
urban
1.104
<.0001
Race
Max-rescaled R2
.0566
Future Research
– Other categories and total expenditures
– Add a Markov component
– Combine the macro and micro analyses
(underreporting for both reporters and
nonreporters)
– Produce overall estimates of underreporting
by category and respondent characteristics