Epidemiologic Methods - SF Coordinating Center Study

Download Report

Transcript Epidemiologic Methods - SF Coordinating Center Study

Epidemiologic Methods- Fall 2002
Lecture
Title
1
Understanding Measurement: Reproducibility & Validity
2
Study Design
3
Measures of Disease Occurrence I
4
Measures of Disease Occurrence II
5
Measures of Disease Association I
6
Measures of Disease Association II
7
Bias in Clinical Research: Selection and Measurement Bias
8
Confounding and Interaction I: General Principles
9
Confounding and Interaction II: Assessing Interaction
10
Confounding and Interaction II: Stratified Analysis
11
Conceptual Approach to Multivariable Analysis I
12
Conceptual Approach to Multivariable Analysis II
Course Administration
• Format
– Lectures: Tuesdays 8:15 am, except for Dec. 10 at 1:30 pm
– Small Group Sections: Tuesdays 1:00 pm except for last
Section, Dec. 3, from 10:30 to 11:30. Begin next week
• Content: Overview and discussion of lectures, and review of assignments.
• Textbooks
– Epidemiology: Beyond the Basics by Szklo and Nieto (S & N).
– Multivariable Analysis: A Practical Guide for Clinicians by M. Katz
• Grading
– Based on points achieved on homework (~80%) & final (~20%).
– Late assignments are not accepted.
• Missed sessions
– All material distributed in class is posted on website.
Definitions of Epidemiology
• The study of the distribution and
determinants (causes) of disease
– e.g. cardiovascular epidemiology
• The method used to conduct human subject
research
– the methodologic foundation of any research
where individual humans or groups of humans
are the unit of observation
Understanding Measurement:
Aspects of Reproducibility and Validity
• Review Measurement Scales
• Reproducibility vs Validity
• Reproducibility
– importance
– sources of measurement variability
– methods of assessment
• by variable type: interval vs categorical
Clinical Research
Sample
Measure
(Intervene)
Analyze
Infer
A study can only be as good as the data . . .
-Martin Bland
Measurement Scales
Scale
Example
Interval
continuous
discrete
weight
WBC count
Categorical
ordinal
nominal
dichotomous
tumor stage
race
death
Reproducibility vs Validity
• Reproducibility
– the degree to which a measurement provides the
same result each time it is performed on a given
subject or specimen
• Validity
– from the Latin validus - strong
– the degree to which a measurement truly
measures (represents) what it purports to
measure (represent)
Reproducibility vs Validity
• Reproducibility
– aka: reliability, repeatability, precision, variability,
dependability, consistency, stability
• Validity
– aka: accuracy
Relationship Between Reproducibility and
Validity
Good Reproducibility
Poor Reproducibility
Poor Validity
Good Validity
Relationship Between Reproducibility and
Validity
Good Reproducibility
Poor Reproducibility
Good Validity
Poor Validity
Why Care About Reproducibility?
Impact on Validity
• Mathematically, the upper limit of a measurement’s
validity is a function of its reproducibility
• Consider a study to measure height in the community:
– Assume the measurement has imperfect
reproducibility: if we measure height twice on a
given person, we get two different values; 1 of the 2
values must be wrong (imperfect validity)
– If study measures everyone only once, errors,
despite being random, will lead to biased inferences
when using these measurements (i.e. lack validity)
>6 ft
<6 ft
Good Poor
B-Ball B-Ball
10
30 40
10
50 60
20
80 100
10% Misclassification
+1
10
10
20
+3
+1
Truth = Prevalence Ratio
= (10/40)
P / (10/60) = 1.5
>6 ft
<6 ft
Good Poor
B-Ball B-Ball
10
32 42
10
48 58
20
80 100
Observed = Prevalence Ratio = (10/42) / (10/58) = 1.38
30
50 +5
80
Impact of Reproducibility on Statistical Precision
• Classical Measurement Theory:
–observed value (O) = true value (T) + measurement error (E)
–If we assume E is random and normally distributed:
E ~ N (0, 2E)
.06
Fraction
.04
.02
0
-3
-2
-1
0
error
Error
1
2
3
Impact of Reproducibility on Statistical Precision
–observed value (O) = true value (T) + measurement error (E)
–E is random and ~ N (0, 2E)
When measuring a group of subjects, the variability of
observed values is a combination of:
the variability in their true values and the variability in the
measurement error
2O = 2T + 2E
Why Care About Reproducibility?
2O = 2T + 2E
• More measurement error means more variability
in observed measurements
–e.g. measure height in a group of subjects.
–If no measurement error
–If measurement error
Height
Why Care About Reproducibility?
2O = 2T + 2E
• More variability of observed measurements has
profound influences on statistical precision/power:
– Descriptive studies: wider confidence intervals
– RCT’s: power to detect a treatment difference is reduced
– Observational studies: power to detect an influence of a
particular risk factor upon a given disease is reduced.
Mathematical Definition of Reproducibility
• Reproducibility


 
  
2
2
T
T
2
2
2
O
T
E
• Varies from 0 (poor) to 1 (optimal)
• As 2E approaches 0 (no error),
reproducibility approaches 1
Power
Phillips and Smith, J Clin Epi 1993
Sources of Measurement Error
• Observer
• within-observer (intrarater)
• between-observer (interrater)
• Instrument
• within-instrument
• between-instrument
Sources of Measurement Error
• e.g. plasma HIV viral load
– observer: measurement to measurement
differences in tube filling, time before processing
– instrument: run to run differences in reagent
concentration, PCR cycle times, enzymatic
efficiency
Within-Subject Variability
• Although not the fault of the measurement process,
moment-to-moment biological variability can have the
same effect as errors in the measurement process
• Recall that:
–
–
–
–
observed value (O) = true value (T) + measurement error (E)
T = the average of measurements taken over time
E is always in reference to T
Therefore, lots of moment-to-moment within-subject biologic
variability will serve to increase the variability in the error
term and thus increase overall variability because
2O = 2T + 2E
Assessing Reproducibility
Depends on measurement scale
• Interval Scale
– within-subject standard deviation
– coefficient of variation
• Categorical Scale
– Cohen’s Kappa
Reproducibility of an Interval Scale
Measurement: Peak Flow
• Assessment requires
>1 measurement per subject
• Peak Flow Rate in 17 adults
(Bland & Altman)
Subject
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Meas. 1 Meas. 2
494
490
395
397
516
512
434
401
476
470
557
611
413
415
442
431
650
638
433
429
417
420
656
633
267
275
478
492
178
165
423
372
427
421
Assessment by Simple Correlation
800
Meas. 2
600
400
200
200
400
600
Meas. 1
800
Pearson Product-Moment Correlation Coefficient
• r (rho) ranges from -1 to +1
( X  X )(Y  Y )
• r 
 ( X  X )  (Y  Y )

2
2
• r describes the strength of linear association
• r2 = proportion of variance (variability) of one
variable accounted for by the other variable
r = 1.0
r = -1.0
r = 1.0
r = -1.0
r = 0.0
r = 0.8
r = 0.8
r = 0.0
Correlation Coefficient for Peak Flow Data
r ( meas.1, meas. 2) = 0.98
Limitations of Simple Correlation for
Assessment of Reproducibility
• Depends upon range of data
– e.g. Peak Flow
• r (full range of data) = 0.98
• r (peak flow <450) = 0.97
• r (peak flow >450) = 0.94
Limitations of Simple Correlation for
Assessment of Reproducibility
• Depends upon ordering of data
• Measures linear association only
1700
1500
1300
Meas. 2
1100
900
700
500
300
100
100
300
500
700
900
Meas 1
1100
1300
1500
1700
Limitations of Simple Correlation for
Assessment of Reproducibility
• Gives no meaningful parameter using the
same scale as the original measurement
Within-Subject Standard Deviation
subject
1
2
3
.
.
.
15
16
17
meas1
494
395
516
.
.
.
178
423
427
meas2
490
397
512
.
.
.
165
372
421
mean
492
396
514
.
.
.
172
398
424
s
2.83
1.41
2.83
.
.
.
9.19
36.06
4.24
• Mean within-subject standard deviation (sw)

s
(2.83  ...  4.24 )

n
17

i
2
i
= 15.3 l/min
2
2
Computationally easier with ANOVA table:
Analysis of Variance
Source
SS
df
MS
F
Prob > F
----------------------------------------------------------------------Between groups
441598.529
16
27599.9081
117.80
0.0000
Within groups
3983.00
17
234.294118
----------------------------------------------------------------------Total
445581.529
33
13502.4706

s i  within - group sum of squares
2
2
s
 within - group mean square  234
17

i
• Mean within-subject standard deviation (sw) :
 within - group mean square  15.3 l/min
sw: Further Interpretation
• If assume that replicate results:
– are normally distributed
– mean of replicates estimates true value
x  true value
sw
(1.96) (sw)
Measured Value
• 95% of replicates are within (1.96)(sw) of true value
sw: Peak Flow Data
• If assume that replicate results:
– are normally distributed
– mean of replicates estimates true value
x  true value
sw = 15.3 l/min
(1.96) (sw) =
(1.96) (15.3) = 30
Measured Value
• 95% of replicates within (1.96)(15.3) = 30 l/min of true value
sw: Further Interpretation
• Difference between any 2 replicates for same person =
diff = meas1 - meas2
• Because var(diff) = var(meas1) + var(meas2), therefore,
s2diff = sw2 + sw2 = 2sw2
sdiff
2
 s diff
 2s 2w  2s w  1.41s w
sw: Difference Between Two Replicates
• If assume that differences:
– are normally distributed and mean of differences is 0
– sdiff estimates standard deviation
xdiff  0
sdiff
(1.96) (sdiff)
Measured Value
• The difference between 2 measurements for the same subject is
expected to be less than (1.96)(sdiff) = (1.96)(1.41)sw = 2.77sw for
95% of all pairs of measurements
sw: Further Interpretation
• For Peak Flow data:
• The difference between 2 measurements for the
same subject is expected to be less than 2.77sw
=(2.77)(15.3) = 42.4 l/min for 95% of all pairs
• Bland-Altman refer to this as the “repeatability” of
the measurement
One Common Underlying sw
Within-Subject Std Deviation
• Appropriate only if there is one sw
• i.e, sw does not vary with true underlying value
Kendall’s correlation
coefficient = 0.17, p = 0.36
40
30
20
10
0
100
300
500
Subject Mean Peak Flow
700
Another Interval Scale Example
• Salivary cotinine in children (Bland-Altman)
• n = 20 participants measured twice
subject
1
2
3
.
.
.
18
19
20
trial 1
0.1
0.2
0.2
.
.
.
4.9
4.9
7.0
trial 2
0.1
0.1
0.3
.
.
.
1.4
3.9
4.0
Cotinine: Absolute Difference vs. Mean
Kendall’s tau = 0.62, p
= 0.001
Subject Absolute Difference
4
3
2
1
0
0
2
4
Subject Mean Cotinine
6
Logarithmic Transformation
subject
1
2
3
.
.
.
18
19
20
trial1
0.1
0.2
0.2
.
.
.
4.9
4.9
7
trial2
0.1
0.1
0.3
.
.
.
1.4
3.9
4
log trial 1
-1
-0.69897
-0.69897
.
.
.
0.690196
0.690196
0.845098
log trial 2
-1
-1
-0.52288
.
.
.
0.146128
0.591065
0.60206
Log Transformed: Absolute Difference vs. Mean
Kendall’s
tau=0.07, p=0.7
Subject abs log diff
.6
.4
.2
0
-1
-.5
0
Subject mean log cotinine
.5
1
sw for log-transformed cotinine data
• sw
 0.0305  0.175
• back-transforming to native scale:
0.175
• antilog(sw) = antilog(0.175) = 10
= 1.49
Coefficient of Variation
• On the natural scale, there is not one common within-subject
standard deviation for the cotinine data
• Therefore, there is not one absolute number that can
represent the difference any replicate is expected to be from
the true value or from another replicate
• Instead, within-subject standard deviation varies with the
level of the measurement and it is reasonable to depict the
within-subject standard deviation as a % of the level
within - subject standard deviation
antilog(s w ) - 1 
within - subject mean
= coefficient of variation
Cotinine Data
• Coefficient of variation = 1.49 -1 = 0.49
• At any level of cotinine, the within-subject
standard deviation of repeated measures is
49% of the level
Coefficient of Variation for Peak Flow Data
• By definition, when the within-subject standard deviation
is not proportional to the mean value, as in the Peak
Flow data, then there is not a constant ratio between the
within-subject standard deviation and the mean.
• Therefore, there is not one common coefficient of
variation
• Estimating the the “average” coefficient of variation is
not very meaningful
Peak Flow Data: Use of
Coefficient of Variation when sw is Constant
Mean of replicates
100
200
300
400
500
600
700
sw
15.3
15.3
15.3
15.3
15.3
15.3
15.3
C.V.
0.153
0.077
0.051
0.038
0.031
0.026
0.022
Pattern of within-subject
standard deviation over
range of measurement
Which Index to Use?
Constant
“Common” within-subject
standard deviation (and its
derivatives)
Proportional to the
magnitude of the
measurement
Coefficient of variation
Neither constant nor
porportional
Family of coefficients of
variation over range of
measurement
Reproducibility of a Categorical
Measurements: Kappa Statistic
• Agreement above that expected by chance
observed agreement - chance agreement
kappa 
1 - chance agreement
• (observed agreement - chance agreement) is the amount of
agreement above chance
• If maximum amount of agreement is 1.0, then (1 - chance
agreement) is the maximum amount of agreement above
chance that is possible
• Therefore, kappa is the ratio of “agreement beyond chance” to
“maximal possible agreement beyond chance”
Sources of Measurement Variability:
Which to Assess?
• Observer
• within-observer (intrarater)
• between-observer (interrater)
• Instrument
• within-instrument
• between-instrument
• Subject
• within-subject
• Which to assess depends upon the use of the measurement
and how/when the measurement will be made:
– For clinical use: all of the above are needed
– For research: depends upon logistics of study (e.g.,
within-observer and within-instrument only are needed if
just one person/instrument used throughout study)
Assessing Validity
• Measures can be assessed for validity in 3 ways:
– Content validity
• Face
• Sampling
– Construct validity
– Empirical validity (aka criterion)
• Concurrent (i.e. when gold standards are present)
– Interval scale measurement: 95% limits of agreement
– Categorical scale measurement: sensitivity & specificity
• Predictive
Conclusions
• Measurement reproducibility plays a key role in determining validity
and statistical precision in all different study designs
• When assessing reproducibility, for interval scale measurements:
• avoid correlation coefficients
• use within-subject standard deviation if constant
• or coefficient of variation if within-subject sd is proportional to
the magnitude of measurement
• For categorical scale measurements, use Kappa
• What is acceptable reproducibility depends upon desired use
• Assessment of validity depends upon whether or not gold standards
are present, and can be a challenge when they are absent
Assessing Validity - With Gold Standards
• A new and simpler device to measure peak flow becomes
available (Bland-Altman)
subject
1
2
3
.
.
.
15
16
17
gold std
494
395
516
.
.
.
178
423
427
new
512
430
520
.
.
.
259
350
451
Plot of Difference vs. Gold Standard
200
Difference
100
0
-100
-200
100
300
500
Gold standard
700
Examine the Differences
200
Difference
100
d 2= 7
0
d1= -81
d3= -35
-100
-200
100
300
500
Gold standard
700
Are the Differences Normally Distributed?
8
Frequency
6
4
2
0
-100
-50
0
diff
50
100
• The mean difference describes any systematic difference
between the gold standard and the new device:
1
1
d  i d i  [(512  494)  ..  (451  427)]  2.3
n
n
• The standard deviation of the differences:
s 
d

i
(d  d )
 38.8
n 1
2
i
• 95% of differences will lie between -2.3 + (1.96)(38.8), or
from -78 to 74 l/min.
• These are the 95% limits of agreement