across groups - University of California, San Francisco
Download
Report
Transcript across groups - University of California, San Francisco
Class 7
Measurement Issues in Research with Diverse
Populations Including Health Disparities
Research
November 5, 2009
Anita L. Stewart
Institute for Health & Aging
University of California, San Francisco
1
Overview of Class 7
Background:
culture-specific versus
generic measures
Conceptual and psychometric adequacy
and equivalence
– Adequacy in one group
– Equivalence across groups
2
Background
U.S.
population becoming more diverse
Minority groups are being included in
research due to:
– NIH mandate (1993 – women and
minorities)
– Health disparities initiatives
3
Types of Diverse Groups
Health
disparities research focuses on
differences in health between …
– Minority vs. non-minority
– Lower income vs. others
– Lower education vs. others
– Limited English Proficiency (LEP) vs. others
– …. and many others
4
Measurement Implications of Research
in Diverse Groups
Most
self-reported measures were developed
and tested in mainstream, well-educated
groups
Little information is available on
appropriateness, reliability, validity, and
responsiveness in diverse groups
– Although this is changing rapidly
5
Measurement Adequacy vs. Measurement
Equivalence
Adequacy
- within a “diverse” group
– concepts are appropriate and relevant
– psychometric properties meet minimal criteria
» Good variability
» Reliable and valid
» Sensitive to change over time
Equivalence
- between “diverse” groups
– conceptual and psychometric properties are
comparable
6
Why Not Use Culture-Specific
Measures?
Measurement
goal is to identify measures
that can be used across all groups in one
study, yet maintain sensitivity to diversity
and have minimal bias
Most health disparities studies compare
mean scores across diverse groups
7
Generic/Universal vs Group-Specific
(Etic versus Emic)
Concepts
unlikely to be defined exactly the
same way across diverse ethnic groups
Generic/universal (etic)
– features of a concept that are appropriate across
groups
Group-Specific
(emic)
– idiosyncratic or culture-specific portions of a
concept
8
Etic versus Emic (cont.)
Goal
in health disparities research with more
than one group:
– identify generic/universal portion of a concept
that are applicable across all groups
For
within-group studies:
– the culture-specific portion is also relevant
9
Overview of Class 7
Background:
culture-specific versus
generic measures
Conceptual and psychometric adequacy
and equivalence
– Adequacy in one group
– Equivalence across groups
10
Conceptual and Psychometric
Adequacy and Equivalence
Conceptual
Concept meaningful
within one group
Concept equivalent
across groups
Adequacy
in 1 Group
Equivalence
Across Groups
Psychometric properties
meet minimal standards
within one group
Psychometric properties
invariant (equivalent)
across groups
Psychometric
11
Left Side of Matrix: Adequacy in a
Single Group
Conceptual
Concept meaningful
within one group
Concept equivalent
across groups
Adequacy
in 1 Group
Equivalence
Across Groups
Psychometric properties
meet minimal standards
within one group
Psychometric properties
invariant (equivalent)
across groups
Psychometric
12
Ride Side of Matrix: Equivalence in
More Than One Group
Conceptual
Concept meaningful
within one group
Concept equivalent
across groups
Adequacy
in 1 Group
Equivalence
Across Groups
Psychometric properties
meet minimal standards
within one group
Psychometric properties
invariant (equivalent)
across groups
Psychometric
13
Overview of Class 7
Background:
culture-specific versus
generic measures
Conceptual and psychometric adequacy
and equivalence
– Adequacy in one group
– Equivalence across groups
14
Approaches to Explore Conceptual
Adequacy in Diverse Groups
Literature
reviews of concepts and
measures
In-depth interviews and focus groups
– discuss concepts, obtain their views
Expert
consultation from diverse groups
– review concept definitions
– rate relevance of items
15
Basis: Published Review - Physical
Activity Measures for Minority Women
WHI
convened experts to identify issues in measuring
PA in minority and older women
Some conclusions:
– Add culturally sensitive activities (e.g., walking for
transportation and errands)
– Measure intermittent activities
– Phrases “leisure time, free time, spare time” (used to denote
non-occupational activities) not understood
Review
can help select appropriate measures and adapt
as needed
LC Masse et al., J Women’s Health, 1998;7:57-67.
16
Basis: Published Review - Measures of
Dietary Intake in Minority Populations
Reviewed
food frequency questionnaires for
appropriateness for minority populations
– method of development, minority-group specific
features, reliability, validity, and systematic bias
Group
differences that could affect scores:
– Portion sizes differ
– Missing common foods of minority groups/cultures
Would
underestimate total intake and nutrients
RJ Coates et al. Am J Clin Nutr; 1997;65(suppl):1108S-15S.
17
A Structured Method for Examining
Conceptual Relevance
Compiled
set of 33 HRQL items
Assessed relevance to older African Americans
After each question, asked “how relevant is this
question to the way you think about your
health?”
– Response scale: 0-10 scale with endpoints labeled
– 0=not at all relevant, 10=extremely relevant
Cunningham WE et al., Qual Life Res, 1999;8:749-768.
18
HRQL Relevance Results
Most
relevant items:
– Spirituality, weight-related health,
hopefulness
– Spirituality items
Least
relevant items:
– Physical functioning, role limitations due to
emotional problems
19
Qualitative Research: Expert Panel
Reviewed Spanish FACT-G
Functional Assessment
of Cancer Therapy –
General (FACT-G)
Bilingual/bicultural panel reviewed items for
conceptual relevance to Hispanics
– One item had low relevance (I worry about dying)
» Added new item "I worry my condition will get worse"
– One domain missing – spirituality
» Developed new spirituality scale (FACIT-Sp) with input
from cancer patients, psychotherapists, and religious
experts
D Cella et al. Med Care 1998: 36;1407
20
Example of Inadequate Concept
Patient
satisfaction typically conceptualized
in mainstream populations in terms of, e.g.,
– access, technical care, communication,
continuity, interpersonal style
In
minority and low income groups,
additional relevant domains include, e.g.,
– discrimination by health professionals
– sensitivity to language barriers
MN Fongwa et al., Ethnicity Dis, 2006;16(3):948-955.
21
Measuring Park/Recreation Environments
in Low-Income Communities
New
policy focus on how environments
promote physical activity
– Many good new measures
None
considered concerns or
environments of lower-income minority
communities
MF Floyd et al. Am J Prev Med, 2009;36:S156-S160.
22
Measuring Park/Recreation Environments
in Low-Income Communities (cont)
Recommendations:
In low-income
communities of color:
– Identify and address most salient
environmental needs
– Incorporate research on preferred recreational
activities
– Ensure representation of perceptions of
residents
MF Floyd et al. Am J Prev Med, 2009;36:S156-S160.
23
Psychometric Adequacy in any Group
Minimal
standards:
– Sufficient variability
–
–
–
–
Minimal missing data
Adequate reliability/reproducibility
Evidence of construct validity
Evidence of sensitivity to change
24
Overview of Class 7
Background:
culture-specific versus
generic measures
Conceptual and psychometric adequacy
and equivalence
– Adequacy in one group
– Equivalence across groups
25
Conceptual Equivalence Across
Groups
Conceptual
Concept meaningful
within one group
Concept equivalent
across groups
Adequacy
in 1 Group
Equivalence
Across Groups
Psychometric properties
meet minimal standards
within one group
Psychometric properties
invariant (equivalent)
across groups
Psychometric
26
Conceptual Equivalence
Is
the concept relevant, familiar, acceptable
to all diverse groups being studied?
Is the concept defined the same way in all
groups?
– all relevant “domains” included (none missing)
– interpreted similarly
27
Obtain Perspective of All Diverse Groups on
Concept
Develop concept
Obtain perspectives of
diverse groups
Create item pool
Pretest/revise
Field survey
Psychometric analyses
Final measures
28
Example: Developing Concept of
Interpersonal Processes of Care
IPC Version I framework
in Milbank Quarterly
19 new focus groups African American, Latino,
and White adults
IPC II
conceptual
framework
Literature review of quality
of care in diverse groups
29
IPC-II Conceptual Framework: 91 items
I. COMMUNICATION
III. INTERPERSONAL STYLE
General clarity
Respectfulness
Elicitation/responsiveness
Courteousness
Explanations of
Perceived discrimination
--processes, condition,
Emotional support
self-care, meds
Cultural sensitivity
Empowerment
II. DECISION MAKING
Responsive to patient preferences
Consider ability to comply
30
IPC-II Conceptual Framework (cont)
IV. OFFICE STAFF
Respectfulness
Discrimination
V. FOR LIMITED ENGLISH PROFICIENCY PATIENTS
MD’s and office staff’s sensitivity to language
31
Psychometric Equivalence
Conceptual
Concept meaningful
within one group
Concept equivalent
across groups
Adequacy
in 1 Group
Equivalence
Across Groups
Psychometric properties
meet minimal standards
within one group
Psychometric properties
invariant (equivalent)
across groups
Psychometric
32
Psychometric or Measurement
Equivalence
When
comparing groups (as in health
disparities research):
– Measures should have similar or equivalent
measurement properties in all diverse groups
of interest in your study
» e.g., English and Spanish, African Americans and
Caucasians
33
Psychometric Equivalence Across
Groups
Psychometric
characteristics should be
“equivalent” across all groups:
– Sufficient variability
–
–
–
–
Minimal missing data
Reliability/reproducibility
Construct validity
Sensitivity to change
34
Bias (Systematic Error) - A Special
Concern
Observed
group mean differences in a
measure can be due to:
– Culturally- or group-mediated differences in
true score (true differences)
-- OR -– Bias - systematic differences between
observed scores not attributable to true scores
35
Random versus Systematic Error
Relevant to reliability
Observed
true
=
+ error
item score
score
random
systematic
Relevant to validity
36
Bias (Systematic Error) - A Special
Concern (cont)
Measurement
bias may make group
comparisons invalid
Bias can be due to group differences in:
–
–
–
–
–
the meaning of concepts or items
the extent to which measures represent a concept
cognitive processes of responding
use of response scales
appropriateness of data collection methods
37
Bias or “Systematic Difference”?
Bias
refers to “deviation from true score”
Cannot speak of a measure being “biased” in
one group compared to another w/o knowing
true score
Preferred term: differential “item”
functioning (DIF)
– Item (or measure) that has a different meaning
in one group than another
38
Item Equivalence
Differential
Item Functioning (DIF)
– Items are non-equivalent if they are
differentially related to the underlying trait
Meaning
of response categories is
similar across groups
Distance between response categories is
similar across groups
39
Methods for Identifying Differential
Item Functioning (DIF)
Item
Response Theory (IRT)
Examines each item in relation to underlying
latent trait
Tests if responses to one item predict the
underlying latent “score” similarly in two
groups
– if not, items have “differential item functioning”
40
Example of Effect of Biased Items
5
CES-D items administered to Black and
White men
– 1 item subject to differential item functioning (bias)
5-item
scale including item suggested that
Black men had more somatic symptoms than
White men (p < .01)
4-item scale excluding biased item showed no
differences
S Gregorich, Med Care, 2006;44:S78-S94.
41
Equivalence of Response Choices:
Spanish and English Self-rated Health
Excellent
Excelente
Very
Muy
good
Good
Fair
Poor
buena
Buena
Regular
Mala
“Regular” in Spanish may be closer to “good” in
English, thus is not comparable to the meaning of “fair”
42
Equivalence of Response Choices:
Spanish and English Self-rated Health
Excellent
Excelente
Very
Muy
good
Good
Fair
Poor
buena
Buena
Regular (pasable?)
Mala
“Regular” in Spanish may be closer to “good” in
English, thus is not comparable to the meaning of “fair”
43
Equivalence of Reliability?? No!
Difficult
to compare reliability because it
depends on the distribution of the construct in a
sample
– Thus lower reliability in one group may simply
reflect poorer variability
More
important is the adequacy of the
reliability in both groups
– Reliability meets minimal criteria within each group
44
Example: Adequacy of Reliability of
Spanish SF-36 in Argentinean Sample
SF-36 scale
Physical functioning
Role limitations - physical
Bodily pain
General health perceptions
Vitality
Social functioning
Role limitations - emotional
Mental health
Coefficient alpha
.85
.84
.80
.69
.82
.76
.75
.84
F Augustovski et al, J Clin Epid, 2008, in press;
45
Equivalence of Criterion Validity
Determine
if hypothesized patterns of
associations with specified criteria are
confirmed in both groups, e.g.
– a measure predicts utilization in both groups
– a cutpoint on a screening measure has the same
specificity and sensitivity in identifying a
condition in both groups
46
Equivalence of Construct Validity
Are
hypothesized patterns of associations
confirmed in both groups?
– Example: Scores on the Spanish version of the
FACT-G had similar relationships with other health
measures as scores on the English version
Primarily
tested through subjectively examining
pattern of correlations
Can also test using confirmatory factor analysis
(CFA)
47
Equivalence of Construct Validity of
Spanish SF-36 in Argentinean Sample
Compared
Spanish SF-36 construct validity test
results to U.S. English SF-36 results
Tested several previously tested hypotheses
(which were confirmed):
– PCS decreases with age and # of diseases
– Relationship of PCS and MCS with utilization
– Known groups validity (scores lower for those with
various diseases)
48
Equivalence of Factor Structure
Factor
structure is similar in new group to
structure in original groups in which
measure was tested
– measurement model is the same across
groups
Methods
– Specify the number of factors you are
looking for
– Determine if the hypothesized model fits the
data
49
How Evidence for Equivalence of
Factor Structure is Obtained
Subjectively
– visually compare factor pattern matrixes across
“group-specific” exploratory factor analysis
solutions
Empirically
– confirmatory factor analysis of data that
includes multiple groups
– studies of psychometric invariance
50
Empirical Examination of Equivalence
of Factor Structure
Psychometric
invariance (equivalence)
Important properties of theoretically-based
factor structure (measurement model) do not
vary across groups (are invariant)
– measurement model is the same across groups
Empirical
comparison across groups using
confirmatory factor analysis
– Not simply by examination
51
Confirmatory Factor Analysis
Hierarchical Tests of Equivalence
Across all groups – a sequential process:
Same number of factors or dimensions
Same items on same factors
Same factor loadings
No bias on any item across groups
Same residuals on items
No item or scale bias AND same residuals
52
Measurement or Psychometric
Invariance
Gregorich, S.E. Do self-report instruments allow
meaningful comparisons across population
groups? Testing measurement invariance using
the confirmatory factor analysis framework.
Med Care, 2006;44 (11, supp 3):S78-S94.
53
Criteria for Evaluating Invariance Across
Groups: Technical Terms
Dimensional Invariance: Same number of factors
Configural Invariance: Same items load on same factors
Metric or Factor Pattern Invariance:
Items have same loadings on same factors
Scalar or Strong Factorial
Invariance:
Observed scores are unbiased
Residual Invariance:
Observed item and factor
variances are unbiased
Strict Factorial Invariance
Both scalar and residual criteria are met
54
Dimensional Invariance of CES-D
Definition:
same number of factors observed in
all groups
Original 4 CES-D factors
–
–
–
–
Somatic symptoms
Depressive affect
Interpersonal behavior
Positive affect
LS Radloff, The CES-D scale: A self-report depression scale for research in
the general population, Applied Psychol Measurement, 1977;1:385-401.
55
No Evidence of Dimensional Invariance
Hispanic EPESE (n=2,536) and a study of older Mexican
Americans (n=330)
2 factors in both studies
– Depression (somatic symptoms, depressive affect, and interpersonal
behavior)
– Well-being
TQ Miller et al., J Gerontol: Soc Sci 1997;520:S259
American Indian adolescents (n=179)
3 factors
– Depressed affect
– Somatic symptoms and reduced activity
– Positive affect
SM Manson et al., Psychol Assessment 1990;2:231-237
56
Configural Invariance
Assumes:
dimensional invariance is found
(same number of factors)
Definition: Item-factor patterns are the same,
i.e., the same items load on the same factors in
both groups
CES-D example
– 4 factors found in Anglos, Blacks, and Chicanos
– Same items loaded on each factor in all groups
RE Roberts et al., Psychiatry Research, 1980;2:125-134
57
Configural Invariance
Dimensional Invariance: Same number of factors
Configural Invariance: Same items load on same factors
Metric or Factor Pattern Invariance:
Items have same loadings on same factors
Strong Factorial or Scalar
Invariance:
Observed scores are unbiased
Residual Invariance:
Observed item and factor variances
can be compared across groups
Strict Factorial Invariance
Both scalar invariance and residual invariance criteria are met
58
Metric Invariance or Factor Pattern
Invariance
Assumes:
dimensional and configural
invariance are found
Definition: Item loadings are the same across
groups
– i.e., the correlation of each item with its factor is
the same in all groups
59
Metric Invariance
Dimensional Invariance: Same number of factors
Configural Invariance: Same items load on same factors
Metric or Factor Pattern Invariance:
Items have same loadings on same factors
Strong Factorial or Scalar
Invariance:
Observed scores are unbiased
Residual Invariance:
Observed item and factor variances
can be compared across groups
Strict Factorial Invariance
Both scalar invariance and residual invariance criteria are met
60
Metric Invariance Example from
Interpersonal Processes of Care
of 91 items – factor structure of 29 items met
criteria of dimensional, configural, and metric
invariance across 4 groups
Out
– Spanish-speaking Latinos, English speaking Latinos,
African Americans, Whites
Dimensional
– Similar factor structure across all 4 groups
Configural
– Same items loaded on each factor in all 4 groups
Metric
– Same item loadings in all 4 groups
61
Seven “Metric Invariant” Scales:
Same Item Loadings Across Groups
I. COMMUNICATION
Hurried communication
Elicited concerns, responded
Explained results, medications
II. DECISION MAKING
Patient-centered decision-making
III. INTERPERSONAL STYLE
Compassionate, respectful
Discriminated
Disrespectful office staff
62
Strong Factorial Invariance or Scalar
Invariance
Assumes:
dimensional, configural, and metric
invariance are found
Definition: Observed scores are unbiased, i.e.,
means can be compared across groups
Requires test of equivalence of mean scores
across groups using confirmatory factor
analysis
63
Strong Factorial Invariance
Dimensional Invariance: Same number of factors
Configural Invariance: Same items load on same factors
Metric or Factor Pattern Invariance:
Items have same loadings on same factors
Strong Factorial or Scalar
Invariance:
Observed scores are unbiased
Residual Invariance:
Observed item and factor variances
can be compared across groups
Strict Factorial Invariance
Both scalar invariance and residual invariance criteria are met
64
Seven “Scalar Invariant” (Unbiased) IPC
Scales (18 items)
I. COMMUNICATION
Hurried communication – lack of clarity
Elicited concerns, responded
Explained results, medications – explained results
II. DECISION MAKING
Patient-centered decision-making – decided together
III. INTERPERSONAL STYLE
Compassionate, respectful–(subset) compassionate, respectful
Discriminated – discriminated due to race/ethnicity
Disrespectful office staff
65
Equivalence of Spanish and English Hospital
Quality of Care Survey (H-CAHPS®)
Tested
7 subscales
– Nurse communication, MD communication, communication
about meds, nursing services, discharge information, pain
control, and physical environment
Report on translation/adaptation, pretesting, item-scale
correlations, internal consistency reliability, and
construct validity
CFA methods compared factor structure between
Spanish and English groups
MP Hurtado et al. Health Serv Res, 2005;40-6, Part II:2140-2161
66
Psychometric or Measurement
Equivalence: Second Meaning
Measurement
properties of a measure in
your diverse group are similar to original
(mainstream) groups on which the
measures were developed
Subjective comparison and evaluation
67
Mixed Methods for Assessing Equivalence
Use
qualitative and quantitative methods
in tandem to address issues of cultural
equivalence
68
Mixed Methods: Developing IPC
Measure of “Cultural Sensitivity”
Initial
concept and items from qualitative work
1st survey: In psychometric analyses, did not meet
minimal criteria
Second version of concept and items
– new qualitative work, results of first study
2nd
survey: In psychometric analyses, measure again
did not meet minimal criteria
Analyzed focus group data in more depth
– cultural sensitivity is multidimensional
3rd
survey: testing multidimensional measures of
cultural sensitivity
69
Conclusions
Measurement
in health disparities and
minority health research is a relatively new
field
Encourage testing and reporting on adequacy
and equivalence of measures tested in any
diverse population
As evidence grows, concepts and measures
that work better across diverse groups will
be identified
70
Resource: Reviews of Measures for
Diverse Populations
Multicultural
measurement in older populations,
JH Skinner et al (eds), Springer Publishing Co:
NY, 2002
– ALSO published as:
Measurement in older ethnically diverse
populations, J Mental Health Aging, Vol 7, Spring
2001
Reviews measures that have been used cross-culturally in: acculturation,
socio-economic status, social supports, cognition, health and functional
capacity, depression, health locus of control, health-related quality of life,
and religiosity
71
Resource: Special Journal Issue
Measurement
in a multi-ethnic society
– Med Care, Vol 44, November 2006
– Qualitative and quantitative methods in
addressing measurement in diverse
populations
72
Resource: Clinical Research with
Diverse Communities
Epi
222, Spring
Course
Director: Eliseo Pérez-Stable, MD
Thursdays 2:45-4:15
– China Basin
Summary
and syllabus for 2008:
http://www.epibiostat.ucsf.edu/courses/schedule/diverse_pops.html
73
Epi 222 Provides Overview Of….
Meaning
of race, ethnicity, social class and culture
Multi-level factors that are mechanisms of health
disparities
Methodological and measurement considerations in
research in ethnically diverse populations
Qualitative methods in developing and pre-testing
instruments
Strategies for recruiting ethnically diverse populations
and for expanding the role of communities
74
Homework for Next Week
For
those interested in studying any
diverse population group:
– Finish matrix: complete rows 27-34
» Translations, equivalence across diverse groups,
acceptability for your population
For
everyone:
– Complete row 34: can measure be modified
75
Next Week (Class 8)
Pretesting
measures and creating a
questionnaire
76