Lecture 5 - Reliability and validity of scales

Download Report

Transcript Lecture 5 - Reliability and validity of scales

Lecture 5: Reliability and
validity of scales
1. Describe the applications of the following types of measurement:
- Impairment, disability, handicap, quality of life, attitudes, behaviour
- Generic versus disease-specific health status and quality of life scales
2. Define the following terms, giving examples of each:
- Response bias
- Social desirability
3. In relation to scales, define the following terms:
- Test-retest reliability
- Inter-rater reliability
- Internal consistency
Scales
• Single- vs multi-item scales
• Items are intended to sample the content of
the underlying construct
• Items summarized in various ways:
– sum or average of responses to individual items
– item weighting or other algorithm
– profiles/sub-scale scores
International Classification of Impairments,
Disabilities, and Handicaps (ICIDH)
• IMPAIRMENT:
– ...loss or abnormality of psychological, physiological,
or anatomical structure or function.
• DISABILITY:
– ...restriction or lack (resulting from an impairment) of
ability to perform an activity …
• HANDICAP:
– ...disadvantage... resulting from an impairment or
disability, that limits or prevents the fulfillment of a role
….that is normal for that individual….
3
Quality of life (QoL)
• Definition
– individuals’ perception of their position in life in the
context of the culture and value systems in which they
live and in relation to their goals, expectations,
standards, and concerns” (WHO QOL group, 1995)
• Domains
– physical, psychological,level of independence, social
relationships, environment, and
spirituality/religion/personal beliefs
Health-related quality of life
(HRQoL)
• Dimensions of QoL related to health
• Related terms:
– health status
– functional status
• Usually includes:
– physical health/function
– mental health/function
– social health/function
Selection of measures:
Appropriateness
• Purpose:
– describe health of population
– evaluate effects of interventions (change over
time)
– compare groups at point in time
– predict outcomes
• Areas of function covered
• Level of health
• Generic/global or specific
Generic vs specific
• Generic
– comparisons across populations and problems
– robust and generalizable
– measurement properties better understood
• Disease-specific
– shorter
– more relevant and appropriate
– sensitive to change
Practical considerations
• Mode of administration
– self-administered (in-person, mail)
– interviewer (face-to-face, telephone)
– informant or proxy
• Respondent burden
Example of single-item measure of
HRQoL: the EuroQol
“thermometer”
• EITHER: visual analogue scale
• OR : Now, to help people say how good or
bad their health state is, let’s say the best
state you can imagine is 100 and the worst
state you can imagine is 0.
• In your opinion, how good or bad is your
health today - please use a number.
Example of Disability Scale: OARS
ADL scale
• Measures basic and instrumental activities of daily
living (ADL)
• 14 items: e.g., bathing, dressing, money
management, house-cleaning
• Based on self-report and/or judgement
• Response scale:
– Completely independent (2)
– Needs some help (1)
– Completely dependent (0)
Example of measure of health status/
HRQoL: SF-36
•
•
•
•
Generic measure of health status
36 items, self-report
Sample item:
Scoring:
– 8 specific sub-scales (e.g., physical function,
mental health, vitality
– 2 component summary scores: physical and
mental health
During the past 2 weeks, did you have any of the following problems with your work or other regular daily activities
as a result of your physical health?
YES
3a. Accomplished less than you would like
1
3b. Were limited in the kind of work or other activities
1
NO
0
0
How much bodily pain did you have during
the past 2 weeks? Was there no pain, very mild pain… (etc)
None
1
Very mild
2
Mild
3
Moderate
4
Severe
5
Very severe
6
Example of specific scale: Geriatric
Depression Scale
• 15 or 30-item self-report scale
• Response options: yes/no
• Sample items:
– Do you feel happy most of the time?
– Do you feel that your life is empty?
Response bias
• Examples:
– Recall
– Acquiescence
– Social desirability
• Factors affecting response bias:
– Question wording/response scale
– Characteristics of subjects: (age, education, etc)
– Mode of data collection (questionnaire,
interview, telephone vs face-to-face)
Social desirability
• Tendency to give answers to questions that are
perceived to be more socially desirable than the
true answer
• Different from deliberate distortion (“faking
good”)
• Depends on:
– Individual characteristics (age, sex, cultural
background)
– Specific question
Social desirability
• Measures of social desirability (SD)
– SD scales (e.g., Jackson SD scale, Crowne & Marlowe
SD scale)
– individual tendency to SD bias
• Prevention
– phrasing of questions
– questionnaire mode
– training of interviewers
Reliability of scales
• Internal consistency
• Test-retest reliability
• Inter-rater and intra-rater reliability
Example: Delirium Index (DI)
• Delirium = acute confusional state
• Characterized by acute onset and fluctuations
• Risk factors:
– Predisposing: age, dementia, disability, comorbidity etc
– Precipitating: infections, medications, environment
• DI: observer-rated measures of severity of 7 symptoms of
delirium:
–
–
–
–
inattention, disorganized thinking
altered consciousness, disorientation
memory impairment, perceptual disturbances
psychomotor agitation or retardation
Administration and scoring
• Administered by research assistant based on
patient observation
• Each symptom rated on 4-point scale:
0 = absent
1 = mild
2 = moderate
3 = severe
Total score: range from 0 - 21
Evaluation of performance of DI
• What aspects should be evaluated?
• How?
Internal consistency
• Relevant to additive scales (that sum or
average items)
• Split-half reliability:
– correlation between scores on arbitrary half of
measure with scores on other half
• Coefficient alpha (Cronbach)
– estimates split half correlation for all possible
combinations of dividing the scale
Example
• Internal consistency of Delirium Index scale
to measure symptoms of delirium:
– Cronbach’s alpha for entire scale:
– ….without perceptual disturbance:
Test-retest reliability (stability)
• Scale is repeated
– short-term
• for constructs that fluctuate, 2 weeks often used to
reduce effects of memory and true change
– long-term
• for constructs that should not fluctuate (e.g.,
personality traits)
• Some measure of variability vs stability of 2
scores is computed
Mean within-patient standard deviation
in DI score during 1st week in hospital
3
2.5
2
1.5
1
0.5
0
Delirium+dementia Delirium
(n=157)
(n=57)
Dementia
(n=55)
Neither
(n=41)
Inter- and intra-rater reliability
Inter-rater reliability
• For scales requiring rater skill, judgment
• 2 or more independent raters of same event
Intra-rater reliability
• Independent rating by same observer of
same event
Measures of inter- and intra-rater
reliability: continuous data
• Measures of correlation
– Correlation graph (scatter diagram)
– Correlation coefficients
• Measures of pairwise comparison
Correlation coefficients
• Pearson’s r
– assesses linear association, not systematic
differences between 2 sets of observations
– sensitive to range of values, especially outliers
• Spearman r
– ordinal or rank order correlation
– less influenced by outliers
– doesn’t assess systematic differences
Correlation coefficients
• Intra-class correlation coefficient (ICC)
– Estimate of proportion of total measurement
variability due to between-individuals (vs error
variance)
– Equivalent to kappa and same range of values
– Reflects true agreement, including systematic
differences
– Affected by range of values - if less variation
between individuals, ICC will be lower
Inter-rater reliability
• Intraclass correlation coefficient (ICC):
n = 26 patients (39 pairs of ratings)
ICC = 0.98 (SD 0.06)
Examples for discussion
• What aspects of reliability should be
measured for the following scales:
– EuroQol VAS
– SF-36
– Geriatric Depression Scale