Analyzing Survey Error with Latent Class Models

Download Report

Transcript Analyzing Survey Error with Latent Class Models

Analyzing Survey Error
with Latent Class Models
Paul Biemer
RTI International and University of North Carolina
March 18, 2005
What is Latent Class Analysis?
 Special case of log-linear analysis with latent variables
 Latent variables are constructs which are measured imperfectly by
indicator variables
 Traditional LCA assumes local independence
 i.e., P(A and B|X) = P(A|X)P(B|X) for latent variable X and
indicators A and B
 LCA models contain
 Structural component – describes relationship among latent
variables and covariates
 Measurement component – describes the relationship among
the indicators, latent variables and covariates
Uses of Latent Class Analysis
in Survey Research
 Substantive researchers focus on the structural component of the
LCM
 Errors treated as nuisance parameters
 Survey methods researchers focus on the measurement component
 Estimate components of total survey error
 Evaluation of questionnaires and alternative survey designs
 Population size estimation
 Compensation for missing data
 Survey bias adjustment
Objective of LCA for Measurement
Error Analysis
 Obtain estimates of classification error for a categorical survey
variable
 For e.g., false positive and false negative error rates
 Why are these LCA estimates useful?
 Quantify the measurement error in the data
 Identify the correlates of measurement
 Trace error to its root causes
 Eliminate the cause through redesign
Example – Estimating the Error in
Survey Measurements of Marijuana Use
Three Indicators of Marijuana Use
Indicator A - How long has it been since you last used marijuana
or hashish?
A = Yes, if indication of last 12 month use
A = “No” if otherwise
Indicator B - Now think about the past 12 months from your 12month reference date through today. On how many days in the
past 12 months did you use marijuana or hashish?
B = “Yes” if response is 1 or more days;
B = “No” otherwise
Indicator C – a composite variable based upon 7 questions such
as
 used in last 12 months?
 spent a great deal of time getting it, using it, or getting over its
effects?
 used drug much more often or in larger amounts than intended?
C = “Yes” if response is positive to any question suggesting use in last
12 months
C = “No” otherwise
Statistical Framework
NOTATION
X =
true drug use status (1 if use, 2 if no use)
unknown latent variable
A, B, and C are 3 dichotomous indicators of X
P( A  a, B  b,C  c) 
P( X  x)P( A  a | X  x)P(B  b | X  x)P(C  c | X  x)
x
or
 abc   x a|x b|x c|x
x
Log-linear Formulation of the Latent
Class Model
 abc   abcx   x a|x b|x c|x
x
x
is equivalent to
log mabcx  u  u xX  uaA  ubB  ucC  uaxAX  ubxBX  ucxCX
in which
mabcx  n abcx
i.e., hierarchical LLM {AX BX CX}
Estimation
Use MLE to obtain estimates of
 x , a| x , b| x and  c| x
from the multinomial likelihood equation of the AxBxC
classification table
L ( ABC )  C ( abc )
a
b
c
mabc
Some Results
(modeling details in Biemer and Wiesen, 2000)
 LCA models were fit to three years of data from the National Survey
of Drug Use and Health
 Discovered several important anomalies were in the estimates of
marijuana use
 Low frequency marijuana uses tended to answer negatively to
the frequency question
 Composite variable was subject to false positive as a result of a
questionnaire problem that was subsequently corrected
False Positive Error Rates Under
Model 1
Indicator of Past
Year Use
1994 1995 1996
P(A = 1|X=2)
0.03
0.01
0.08
P(B = 1|X=2)
0.73
0.78
0.84
P(C = 1|X=2)
4.07
1.17
1.36
Estimates of False Negative Error
Rates
Indicator of Past
Year Use
1994 1995 1996
P(A = 2|X=1)
7.29
8.96
8.60
P(B = 2|X=1)
1.17
0.90
1.39
P(C = 2|X=1)
6.60
5.99
7.59
Frequency of Use
for Persons Responding ‘No’ to A
More than 300 days
201 to 300 days
101 to 200 days
51 to 100 days
25 to 50 days
12 to 24 days
6 to 11 days
3 to 5 days
1 to 2 days
5.84
5.84
0.96
0.93
1.45
2.96
4.76
6.06
18.41
58.62
Other Applications
Nonsampling Error Research
 Identifying flawed questions and other questionnaire
problems
 Estimating census undercount in a capture-recapture
framework
 Characterizing respondents, interviewers, and
questionnaire elements that contribute to survey error
 Adjusting for nonresponse and missing data in
surveys
Other Applications (cont’d)
Substantive Research
 Causal modeling
 Log-linear analysis compensating for measurement
error
 Cluster analysis
 Variable reduction and scale construction
Importance of Model Validity Depends
Upon the Application
 In the previous example, validity was “proven” by ability to identify
real questionnaire problems.
 In other applications, this type of validation may be quite difficult
 Further, LCA methodology is being pushed to adjust the reported
survey estimates for misclassification bias.
 Unemployment rate
 Expenditures
 Total population size in a census
Some Issues for Future Research
Investigating the Validity of LCA Estimates
 Robustness of the estimates of classification error probabilities
to violations of the model assumptions
 Local dependence
 Unobserved heterogeneity
 Dependent classification errors
 Unequal probability sampling
 Sample clustering
Some Issues for Future Research
(cont’d)
 Robustness of the model fit statistics
 L2 and X2
 Convergence problems
 Local maxima
 Boundary solutions
 Bias in the estimates of standard errors of the estimates
 Effects of weighting
 Clustered samples
Some Recent Literature
 Asparouhov, T., Muthen & Muthen (2004). “Weighting for Unequal
Probability of Selection in Latent Variable Modeling,” Mplus Web
Notes: No. 7, Version 3
 Patterson, B., Dayton, M., and Graubard, B. (2002). “Latent Class
Analysis of Complex Sample Survey Data: Application to Dietary
Data,” JASA, Vol. 97, No. 459, pp. 721-741
 Vermunt, J. and Magidson, J. (2001). “Latent Class Analysis with
Sampling Weights,” presented at the Sixth Annual Meeting of the
Methodology Section of the American Sociological Association,
University of Minnesota
 Biemer, P., Brown, G., and Judson, D. (2004). “Robustness of LCA
Estimates of Population Size to Model Failure,” unpublished
Census Bureau project reports