General Latent Variable Modeling Approaches to Measurement

Download Report

Transcript General Latent Variable Modeling Approaches to Measurement

General Latent Variable
Modeling Approaches to
Measurement Issues using
Mplus
Rich Jones
[email protected]
Psychometrics Workshop
Friday Harbor, San Juan Island, WA
August 24, 2005
Overview
• Part 1
– IRT overview
– DIF overview
• Part 2
– IRT via Factor Analysis
– Factor analysis and general latent variable models for
measurement issues using Mplus
– Limitations of Mplus approach
• Part 3
– Applied Example
• Part 4 (time permitting)
– Bells and Whistles
– Discussion
Part 1a
IRT overview
Semantics
• Multiple Fields, Conflicting Language
– Educational Testing, Psychological Measurement,
Epidemiology & Biostatistics, Psychometrics &
Structural Equation Modeling
• Characteristics of People
– ability, trait, state, construct, factor level, item
response
• Characteristics of Items
– difficulty, severity, threshold, location
– discrimination, sensitivity, factor loading,
measurement slope
Key Ideas of IRT
• Persons have a certain ability or trait
• Items have characteristics
– difficulty (how hard the item is)
– discrimination (how well the item measures the ability)
– (I won’t talk about guessing)
• Person ability, and item characteristics are
estimated simultaneously and expressed on
unified metric
• Interval-level measure of ability or trait
• Used to be hard to do
Some Things You
Can Do with IRT
1.
2.
3.
4.
5.
Refine measures
Identify ‘biased’ test items
Adaptive testing
Handle missing data at the item level
Equate measures
Latent Ability / Trait
• Symbolized with qi or hi
• Assumed to be continuously, and often normally,
distributed in the population
• The more of the trait a person has, the more
likely they are to ...whatever...(endorse the
symptom, get the answer right etc.)
• The latent trait is that unobservable, hypothetical
construct presumed to be measured by the test
(assumed to “cause” item responses)
Item Characteristic Curve
• The fundamental conceptual unit of
IRT
• Relates item responses to ability
presumed to cause them
• Represented with cumulative
logistic or cumulative normal forms
Item Response Function
P(yij=1|qi) = F[aj(qi-bj)]
Example of an Item Characteristic Curve
Probability of Correct Response
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
-3.0
-2.0
-1.0
0.0
1.0
Latent Ability Distribution
2.0
3.0
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
Latent
Trait
Density
-3
-2
-1
0
Latent Trait Level
1
2
3
Example of an Item Characteristic Curve
Probability of Correct Response
1.00
0.90
0.80
0.70
0.60
A Person with High Ability
Has a High Probability of
Responding Correctly
0.50
0.40
0.30
0.20
0.10
0.00
-3.0
-2.0
-1.0
0.0
1.0
Latent Ability Distribution
2.0
3.0
Example of an Item Characteristic Curve
Probability of Correct Response
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
A Person with Low Ability
Has a Low Probability of
Responding Correctly
0.20
0.10
0.00
-3.0
-2.0
-1.0
0.0
1.0
Latent Ability Distribution
2.0
3.0
Example of an Item Characteristic Curve
Probability of Correct Response
1.00
0.90
0.80
0.70
0.60
Item Difficulty: The level of
ability at which a person has
a 50% probability of
responding correctly.
0.50
0.40
0.30
0.20
0.10
0.00
-3.0
-2.0
-1.0
0.0
1.0
Latent Ability Distribution
2.0
3.0
Example of Two ICCs that Differ in Difficulty
Probability of Correct Response
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
-3.0
-2.0
-1.0
0.0
1.0
Latent Ability Distribution
2.0
3.0
Example of an Item Characteristic Curve
Probability of Correct Response
1.00
0.90
0.80
0.70
0.60
Item Discrimination:
How well the item separates
persons of high and low ability;
Proportional to the slope of the
ICC at the point of inflection
0.50
0.40
0.30
0.20
0.10
0.00
-3.0
-2.0
-1.0
0.0
1.0
Latent Ability Distribution
2.0
3.0
Example of Two ICCs that Differ in Discrimiation
Probability of Correct Response
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
-3.0
-2.0
-1.0
0.0
1.0
Latent Ability Distribution
2.0
3.0
Item Response Function
Logistic model:
eDaj(qi-bj)
1
P(Yij=1|q) =

1-eDaj(qi-bj) 1+e-Daj(qi-bj)
Cumulative normal probability model:
P(Yij=1|qi) =
a j(qib j)
-
1 -t2/2
e dt
2
Extra Credit
one way to get estimates of underlying ability
Remember Bayes Theorem
P(AB) = P(A)P(B|A)
P(AB) = P(B)P(A|B)
P(A)P(B|A)
P(A|B) =
P(B)
Extra Credit
one way to get estimates of underlying ability
Bayes modal estimates of latent ability (h)
(modal a posteriori [MAP] estimates)
likelihood function for response pattern U
given ability h:
p
g(U|h) = i Pi Q i
yi
1-yi
a posteriori likelihood function of h given
pattern U:
(h)g(U|h)
g(h|U) =
g(U)
Part 1b
DIF Overview
Identify Biased Test Items
Differential Item Functioning (DIF)
• Differences in likelihood of error to a given item
may be due to
– group differences in ability
– item bias
– both
• IRT can parse this out
• Item Bias = Differential Item Function +
Rationale
• Most workers in IRT identify DIF when two
groups do not have the same ICC
Example of group heterogeneity but no DIF
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
Latent
Trait
Dens ity
-3
-2
-1
0
1
Latent Trait Level
2
3
Example of group heterogeneity and uniform DIF
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
Latent
Trait
Dens ity
-3
-2
-1
0
1
Latent Trait Level
2
3
Example of group heterogeneity and non-uniform DIF
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
Latent
Trait
Dens ity
-3
-2
-1
0
1
Latent Trait Level
2
3
Part 2
IRT and Factor Analysis
IRT and Factor Analysis
• IRT describes a class of statistical models
• IRT models can be estimated using factor
analysis
– Appropriate routines for ordinal dependent
variables (tetrachoric/polychoric correlation
coefficients)
• Factor analysis models can be extended in
very general ways using structural
equation modeling techniques / software
• www.statmodel.com
• Used to be LISCOMP, owes lineage to LISREL
• Does just about everything other continuous
latent variable / structural equation software
implement (LISREL, EQS, AMOS, CALIS)
• Plus, very general latent variable modeling
–
–
–
–
Continuous latent variables (latent traits)
Categorical latent variables (latent classes, mixtures)
Missing data
Estimation with data from complex designs
• Expensive, demo version available
Mplus approach to IRT Model
• One or Two-parameter IRT models (not explicit)
– Discrimination ≈ Factor loadings/slopes
– Difficulty ≈ Item thresholds
• Two estimation methods
– Weighted Least Squares
• Limited information
• Multivariate probit (theta or delta parameterization)
• Latent response variable formulation (Assume underlying
continuous variables)
– Maximum Likelihood
• Full information
• Multivariate logistic
• Conditional probability formulation
– More experience, fit statistics with WLS
– Some model types require ML, others WLS
Latent Response Variable
Formulation (picture)
y=0
.4
y=1
.3
.2
.1
0
-4
-2
0
y*

2
4
Latent Response Variable
Formulation (words)
• Assume observed ordinal (dichotomous) y
has corresponding underlying continuous
normal but unobservable (latent) form (y*)
• When a person’s value for y* exceeds
some threshold (), y=1 is observed,
otherwise, y=0 is observed
• Analysis is focused on relationship among
the y* and estimating the thresholds ()
Latent Response Variable
Formulation (equation)
(dichotomous case)
yi=



1 if y*i >
0 otherwise
(40)†
y*i = xi + i
(42)†
P(yi=1|x) = P(y*i >|x) = 1-P(y*i  |x)
(43)†
P(yi=1|x) =

1-F

 -+x 
-x 
 = F

V()
 V()
(45)†1
If we standardize to V(i)=1,
P(yi=1|x) = 1-F[-x] = F[-+x]
1
page 4 in Special Topics in Latent Variable Modeling Using Mplus (2003), which is the day 3 hand-out from the Mplus Short Course series, available
for purchase at www.statmodel.com
Conditional Probability
Formulation
Conditional Probability Formulation
P(yi=1|x) = F[+x]
(21)†
Recall that the LRV formulation specified
P(yi=1|x) = F[-+x]
when we standardize to V(i)=1, so we see that
=- ,
=
†
1
Equation number from Mplus Short Course Handouts, Special Topics in Latent Variable Modeling Using Mplus (2003)
page 4 in Special Topics (2003)
(45)†1
Factor Analysis Model
y* = h + 
1
h
2
3
4
y1
y2*
*
y3*
y4*
1
2
3
4
VAR(y*) = '
VAR(h) = 
assuming VAR(h) = 1

a=
2
1-

b=

Factor Analysis Model
y* = h + 
1.00
P(y=1|h)
VAR(y*) = '
VAR(h) = 
0.50
assuming VAR(h) = 1

a=
2
1-
0.00
-3
-2
-1
0
h
1
2
3

b=

Factor Analysis with Covariates
MIMIC Model

1,1
x1
1


1
1,1
Multiple Indicators, Multiple Cause
1,1
h1
2
3
4
y1*
y2*
y3*
y4*
1
y = h + x + 
 2 assuming VAR(h) = 1, h=0
3
a=

x
2 , b =

1-
 4  is sufficient to describe
uniform DIF
Multiple Group CFA
group = 0
group = 1
1
h
2
3
4
y1*
y2*
y3
y4*
*
1
1
2
2
3
4
h
3
4
y1*
y2*
y3*
y4*
1
2
3
4
Multiple Group (MG) MIMIC
group = 0

1,1
x1
1


1
1,1
group = 1
1,1
h1
2
3
4
y1*
y2*
y3
y4*
*
1
4
1,1
x1
1

2
3


1
1,1
1,1
h1
2
3
4
y1*
y2*
y3*
y4*
1
2
3
4
MIMIC and MG-MIMIC Model
• Disadvantages
– Not so good for factor score generation
– Not exactly the IRT model
• different conceptualization of NU-DIF
• Some work to get a’s b’s and standard errors
– Relatively little experience / literature in field
– Confusing / overlapping measurement
noninvariance literature from SEM field
MIMIC and MG-MIMIC Model
• Advantages
–
–
–
–
–
–
–
–
–
Can be easy to estimate, good for modeling
No need to equate parameters
No data re-arrangements required, missing data tricks
Simultaneous analysis/evaluation of all items and
possible sources of model mis-fit (including potential
DIF or bias)
Multiple independent variables (with DIF)
Y’s and X’s can be categorical or continuous
Anchor items not necessary, but...
Embed in more complex models
Complimentary measurement noninvariance literature
from SEM field
MIMIC Model: how to do it
From within STATA using runmplus.ado
runmplus y1-y4 x1, categorical(y1-y4)
type(meanstructure) model(eta by y1-y4*; eta@1;
eta on x1*; y1 on x1*;)
Mplus syntax file
Title:
Data:
Variable:
Analysis:
MODEL:
MIMIC model
File is __000001.dat ;
Names are y1 y2 y3 y4 x1;
categorical= y1-y4 ;
type= meanstructure ;
eta by y1-y4* ;
eta@1 ;
eta on x1* ;
y1 on x1* ;
Some Applied Examples
and Technical Articles
•
•
•
•
•
•
•
•
•
•
Muthén, B. O. (1989). Latent variable modeling in heterogeneous populations. Meetings of
Psychometric Society (1989, Los Angeles, California and Leuven, Belgium). Psychometrika,
54(4), 557-585.
McArdle, J., & Prescott, C. (1992). Age-based construct validation using structural equation
modeling. Experimental Aging Research, 18(3), 87-116.
Gallo, J. J., Anthony, J. C., & Muthén, B. O. (1994). Age differences in the symptoms of
depression: a latent trait analysis. Journals of Gerontology, 49(6), 251-264.
Salthouse, T., Hancock, H., Meinz, E., & Hambrick, D. (1996). Interrelations of age, visual acuity,
and cognitive functioning. Journal of Gerontology: Psychological Sciences, 51B(6), P317-P330.
Grayson, D. A., Mackinnon, A., Jorm, A. F., Creasey, H., & Broe, G. A. (2000). Item bias in the
Center for Epidemiologic Studies Depression Scale: effects of physical disorders and disability in
an elderly community sample. The Journals of Gerontology. Series B, Psychological Sciences and
Social Sciences, 55(5), 273-282.
Jones, R. N., & Gallo, J. J. (2002). Education and sex differences in the Mini Mental State
Examination: Effects of differential item functioning. The Journals of Gerontology. Series B,
Psychological Sciences and Social Sciences, 57B(6), P548-558.
Macintosh, R., & Hashim, S. (2003). Variance Estimation for Converting MIMIC Model Parameters
to IRT Parameters in DIF Analysis. Applied Psychological Measurement, 27(5), 372-379.
Rubio, D.-M., Berg-Weger, M., Tebb, S.-S., & Rauch, S.-M. (2003). Validating a measure across
groups: The use of MIMIC models in scale development. Journal of Social Service Research,
29(3), 53-68.
Fleishman, J. A., & Lawrence, W. F. (2003). Demographic variation in SF-12 scores: true
differences or differential item functioning? Med Care, 41(7 Suppl), III75-III86.
Jones, R. N. (2003). Racial bias in the assessment of cognitive functioning of older adults. Aging
& Mental Health, 7(2), 83-102.
Part 3
An Applied Example
Jones, R. N. (2003). Racial bias in the assessment of cognitive functioning of
older adults. Aging & Mental Health, 7(2), 83-102.
Acknowledgement: R03 AG017680
Example: Racial bias in TICS
(HRS/HEAD)
• Nationally representative, very large
sample (N=15,257)
• Over-sample of Black or AfricanAmericans (N=2,090)
• Assessment of cognition
• Very adequate assessment of SES
(education, income, occupation)
Objective
• Evaluate the extent to which item level performance
is due to test-irrelevant variance due to race (White,
non-Hispanic vs. Black or African-American
participants)
• Control for main and potentially differential effects of
background variables
•
•
•
•
Sex, Age
Educational attainment
Household income, occupation groups
Health Conditions and Health Behaviors
TICS/AHEAD Measure of
Cognitive Function (Herzog 1997)
Points
•
•
•
•
•
•
•
Orientation to time (weekday, day, month, year)
Name President, Vice-President
Name two objects (cactus, scissors)
Count Backwards from 20
Serial Sevens
Immediate recall (10 nouns)
Delayed free-recall (10 nouns, 5 min delay)
4
2
2
1
5
10
10
Background Variables
•
•
•
•
Sex
Age (9 groups)
Education (6 groups)
Household Income (5
groups)
• ‘Highest’ household
occupation (8 groups)
• Health Conditions (HBP,
DM, heart, stroke,
arthritis, pulmonary,
cancer)
• Health Behaviors (current
smoking, drinking [three
groups])
2,1
1,1
age 60-64
age 65-69
age 75-79
age 80-84
age 85-90
age 90+
y2

1,14
grade 13+
y3
$10k-<20k
clerical
sales
crafts
operatives
service, labor
HBP
diabetes
heart
stroke
arthritis
lung disease
1,18
h
1
cancer

1,25
6,14
name objects

4

y4
1,21
missing

     
managers
$40k or more
*
$5k -<10k
 

count bckwds
5
*
income<$5k
presidents

3
*
grade 9-11
1

grade 8
time

2

grade 1-7
y1
1,9
1,10
grade 0
*
age 55-59
*
age 50-54
1
1,5

female
y5

1,32
delayed recall
smoker
never drink
drinker
serial sevens
1,35

6
2,1
1,1
age 60-64
age 65-69
age 75-79
age 80-84
age 85-90
age 90+

1,14
grade 13+
y3
$10k-<20k
clerical
sales
crafts
operatives
service, labor
HBP
diabetes
heart
stroke
arthritis
lung disease
1,18
h
1
cancer

1,25
6,14



never drink
count bckwds

serial sevens
1,32
6

1,35
drinker
name objects
5
y5
delayed recall
smoker
presidents
4

y4
1,21
missing
  
     
managers
$40k or more
*
$5k -<10k
*
income<$5k
time
3
*
grade 9-11
y2

grade 8

2
1

grade 1-7
y1
1,9
1,10
grade 0
1
1,5
*
age 55-59
*
age 50-54

female
White (not Hispanic)
Black or African American
2,1
1,1
age 60-64
age 65-69
age 75-79
age 80-84
age 85-90
age 90+

1,14
grade 13+
y3
$10k-<20k
clerical
sales
crafts
operatives
service, labor
HBP
diabetes
heart
stroke
arthritis
lung disease
1
cancer
1,25
6,14



never drink
drinker
name objects
count bckwds
5
y5

serial sevens
1,32
delayed recall
smoker
presidents
4

y4
1,21
missing

     
managers
$40k or more
  
*
$5k -<10k
h
*
income<$5k
1,18
time
3
*
grade 9-11
y2

grade 8

2
1

grade 1-7
y1
1,9
1,10
grade 0
1
1,5
*
age 55-59
*
age 50-54

female
1,35

6
Results
• All items show DIF by race, some by sex,
age, education
• Effect of covariates (age, occupation, income,
smoking status) significantly different across
racial group
• Greater variance in latent cognitive function
for Black or African-American participants
• No significant race difference in mean latent
cognition by race after adjusting for
measurement differences
Jones. Aging Ment Health, 2003; 7:83-102.
Differences in Underlying Ability
between Whites and African Americans
• 60% is due to measurement differences (DIF,
item bias)
• 12% is due to main effect of background
variables
• 7% is due to structural differences (i.e.,
interactions of group and background variables)
• What remains (about .2 SD) is not significantly
different from no difference
Jones. Aging Ment Health, 2003; 7:83-102.
Differences in Underlying Ability
ignoring measurement bias
Baseline model-implied distribution of cognitive functioning trait
0.50
White
0.45
Black or African American
0.40
Density
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
-3
-2
-1
0
1
Cognitive Function Level
HRS/AHEAD data (N=15,257); Jones (2003)
Jones. Aging Ment Health, 2003; 7:83-102.
2
3
Differences in Underlying Ability
after controlling for measurement bias
Final model-implied distribution of cognitive functioning trait
0.50
White
0.45
Black or African American
0.40
Density
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
-3
-2
-1
0
1
Cognitive Function Level
HRS/AHEAD data (N=15,257); Jones (2003)
Jones. Aging Ment Health, 2003; 7:83-102.
2
3
Differences in Underlying Ability
after controlling for measurement bias
interaction with age group
Model-Implied Age Differences in Latent Cognitive Function
3
White
B/Af. Am.
2
1
0
-1
-2
-3
50
60
70
80
Age
Jones. Aging Ment Health, 2003; 7:83-102.
90
Name Vice-President (Whites and Black or African-Americans)
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
Latent
Trait
Dens ity
-4
-3
-2
-1
0
1
Level of Cognitive Function
2
3
4
B/A-A show n w ith dashed line. Jones (2003); data from HRS-AHEAD study (n=15,257)
Second Word Recognition (Whites and Black or African-Am.)
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
Latent
Trait
Dens ity
-4
-3
-2
-1
0
1
Level of Cognitive Function
2
3
4
B/A-A show n w ith dashed line. Jones (2003); data from HRS-AHEAD study (n=15,257)
Fifth Serial Subraction (Whites and Black or African-Am.)
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
Latent
Trait
Dens ity
-4
-3
-2
-1
0
1
Level of Cognitive Function
2
3
4
B/A-A show n w ith dashed line. Jones (2003); data from HRS-AHEAD study (n=15,257)
Model Fit / Parsimony
• Model fitting accomplished more than
shifting group differences in mental status
to item-level
• New model provides greater fit to
observed data using fit statistics that
reward model parsimony
Part 4
Bells and Whistles
Discussion
Latent Growth Model
y
y


x


y

h




y


h


h =  + x + 
y =  + x + 
Multiple Indicator
Latent Growth Model
y y y y
y y y y
y y y y
y y y y
h
h
h
h


x

h




h


y = h +  x + 
h =  + h + x + 
Measurement Mixture Models
Cardio/Cerebrovascular disease or
risk factors
x1
x2
x6
Background
Variables
x3
x4
Latent
Class
x7
x5
y1
h1
y2
yp
pattern mixture
part
measurement
part
General Latent Variable Framework for Probing Mechanisms
Linking Impairments, Mobility, and Discrete Outcomes
Leg Power
Trunk Endurance
Covariates
(e.g., age, sex, depression,
self-efficacy, mental state)
Leg Pain
Back Pain
Leg Strength
Balance
Intercept
Impairment
classes
Growth Trajectory classes
SPPB (year 1)
Slope
Aerobic Capacity
Discrete
Time Survival
Model
Obesity
Range of Motion
Peripheral Sensory Loss
y1
y2
y3
yt
(e.g., falls, morbidity,
mortality, disability)
SPPB (baseline)
SPPB (year 2)
General Latent Variable Framework for Probing Mechanisms
Linking Impairments, Mobility, and Discrete Outcomes
I. Latent Class (Profile Mixture)
Model of Impairments
Leg Power
Trunk Endurance
Covariates
(e.g., age, sex, depression,
self-efficacy, mental state)
II. Random Coefficient (latent growth)
Model of Mobility Change
Leg Pain
Back Pain
Leg Strength
Balance
Intercept
Impairment
classes
Growth Trajectory classes
SPPB (year 1)
Slope
Aerobic Capacity
SPPB (baseline)
SPPB (year 2)
Discrete
Time Survival
Model
Obesity
Range of Motion
Peripheral Sensory Loss
y1
y2
y3
yt
(e.g., falls, morbidity,
mortality, disability)
III. Discrete Time Survival Model
of Distal Outcome
Part 4b
Discussion