TESTING A TEST

Download Report

Transcript TESTING A TEST

TESTING A TEST
Ian McDowell
Department of Epidemiology &
Community Medicine
November, 2004
A Lab Report
(Montfort Hospital Biochem Lab)
2
The Challenge of Clinical
Measurement
• Diagnoses are based on information, from formal
measurements or from your clinical judgment
• This information is seldom perfectly accurate:
– Random errors can occur
– Biases in judgment or measurement can occur
– Due to biological variability, this patient may not fit the
general rule
– Diagnosis (e.g., hypertension) involves a categorical
judgment; this often requires dividing a continuous score
(blood pressure) into categories. Choosing the cutting-point
may be arbitrary
3
Therefore…
• You need to be aware …
– That diagnosis is a matter of probabilities
– That using a quantitative approach is better than
just guessing!
– That you will ultimately become familiar with
the typical accuracy of measurements in your
chosen clinical field
– Of some of the ways to describe the accuracy of
a measurement
– That the principles apply to both diagnostic and
screening tests
4
Attributes of Tests or Measures
• Cost, Safety, Acceptability, etc.
• Reliability: reproducibility; this considers
chance or random errors
• Validity: Does it measure what it is supposed
to measure?
By extension, what diagnostic conclusion can I
draw from a particular score on the test?
Validity may be affected by bias, or systematic
errors
Reliability and Validity
Reliability
Low
Validity
Low
•
•
•
High
•••
•••
•
•
•
•
High •
•
•
•
•••
••
•
•
6
Ways of Assessing Validity
• Face, Content validity: does it make
clinical or biological sense? Does it include
the relevant symptoms?
• Criterion: comparison to a “gold standard”
definitive measure
– Expressed as sensitivity and specificity
• Construct validity (this is used with
abstract themes, such as “quality of life” for
which there is no definitive standard)
“Gold Standards”
Sensitivity and specificity are judged against
• More definitive (but expensive or invasive)
tests, such as a complete work-up,
Or against
• Eventual outcome (for screening tests, when
workup of well patients is unethical)
2 x 2 Table for Testing a Test
Positive test
Negative test
Validity:
Gold standard
Disease Disease
Present
Absent
a (TP)
b (FP)
c (FN)
d (TN)
Sensitivity Specificity
= a/(a+c) = d/(b+d)
TP = true positive; FP = false positive…
A Bit More on Sensitivity
= Ability to detect disease when it is present
• a/(a+c) = TP/(TP+FN)
• Mnemonics:
a sensitive person is one who can detect
your feelings
(1 – seNsitivity) = false Negative rate (i.e.,
How many cases are missed by the
screening test?)
• Cf. power of statistical test (1-)
…and More on Specificity
Ability to detect absence of disease when it is
truly absent (can it detect non-disease?)
• d/(b+d) = TN/(FP+TN)
• Mnemonics:
– a specific test would identify only that type of
disease. “Nothing else looks like this”
– (1- sPecificity) = false Positive rate (How many
are falsely classified as having the disease?)
Clinical applications
• A specific test can be useful to rule in a
disease. If the result on a specific test is
positive, you can be sure the patient has the
condition: “SpPin”
• A sensitive test can be useful for ruling a
disease out. A negative result on a very
sensitive test reassures you that the patient
does not have the disease: (“SnNout”)
12
The Selection of a Cutting Point
Well population
Sick population
Healthy
scores
Pathological
Move this way
Move this way scores
to increase
to increase
sensitivity
specificity
Crucial issue: changing cut-point can improve 13
sensitivity or specificity, but at expense of the other
Problems with Wrong Results
• False Positives can arise due to other factors (such
as taking other medications, diet, etc.) They entail
cost and danger of investigations, labeling, worry
– This is similar to Type I or alpha error in a test of
statistical significance: the possibility of falsely
concluding that there is an effect of an intervention.
• False Negatives imply missed cases, so potentially
bad outcomes if untreated
– cf Type II or beta error: the chance of missing a true
difference
The Crucial Point: Predictive Values
• Sensitivity & specificity are characteristics
of the test
• But the clinician, of course, gets the test
result and do not know if this person is a
true positive or a false positive (or a true or
false negative). Hmmm…
• How do we assess the predictive value of a
positive or negative result?
15
Predictive Values
• Based on rows, not columns
• PPV = a/(a+b); interprets positive test
D+ Db
T+ a
d
T- c
• NPV = d/(c+d); interprets negative test
• Immediately useful to clinician: they tell us about
the population and thus the patient
• Depend upon prevalence of disease, so must be
determined for each clinical setting
• As prevalence goes down, PPV goes down and
NPV rises
Same Test, Two Clinical Situations
A. Referral hospital:
Prevalence = 55/165 = 33%
D+
D-
T+
50
10
T-
5
100
Sensitivity = 50/55 = 91%
Specificity = 100/110 = 91%
PPV = 50/60 = 83%
NPV = 100/105 = 95%
B. Primary Care:
Prevalence = 55/1155 = 3%
D+
D-
T+
50
100
T-
5
1000
Sensitivity = 50/55 = 91%
Specificity = 1000/1100 = 91%
PPV = 50/150 = 33%
NPV = 1000/1005 = 99.5%
17
Practical Question:
“Doctor, what’s my likelihood of having
the disease?”
To answer this question
• You need to have a general idea of the sensitivity
& specificity of the test
• To interpret the results, you also need to know
roughly the prevalence of the condition in your
practice. You can then work out the PPV and
answer the patient’s question.
“Give me a break, dude … Surely there is an easier
way to bring all this together?”
18
Prevalence of Disease
• We have seen how this influences the
interpretation of a test score
• Before you do the test, prevalence gives
your best guess about the probability that
the patient has the disease
• Also known as Pretest Probability of
Disease: (a+c) / N in 2 x 2 table
a
b
• Or, can be expressed as odds of
c
d
disease: (a+c) / (b+d)
N
Estimating predictive values for a specific
setting is called ‘calibrating’ the test
You could:
– Apply a the test and a definitive test to a
consecutive series of patients (rarely
feasible)
– Calculate from Bayes’s Theorem (ouch!)
– Draw a hypothetical table (maybe?)
– Use a nomogram (tell me how)
Calibration by hypothetical table
Fill cells in following order:
“Truth”
Disease
Disease
Present
Absent
Test Pos
4th
7th
5th
6th
Test Neg
Total
2nd
3rd
(from prevalence)
(from sensitivity) (from specificity)
Total
8th
9th
1st
PV
10th
11th
Combining Sensitivity and Specificity:
Receiver Operating Characteristic Curves
Work out Sen and Spec at every possible cut-point, then plot these.
Area under the curve indicates the information provided by the test
1
Sensitivity
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1-Specificity (= false positives)
1
Note:
the theme of
sensitivity &
(1-specificity)
will appear
again!
22
Likelihood Ratios
• Defined as the odds that a given level of a
diagnostic test result would be expected in a
patient with the disease, as opposed to a patient
without: true positives / false positives.
• Advantages:
– Express sensitivity and specificity in one number
– Can be calculated for many levels of the test
– Can be turned into predictive values
• LR for positive test = Sensitivity / (1-Specificity)
• LR for negative test = (1-Sensitivity) / Specificity
23
Calibration with a Nomogram
1) You need the LR.
2) Select pretest probability
(prevalence) on left axis
3) Select likelihood ratio on
center axis
4) Draw line through
right axis to indicate posttest probability of disease
Example: Prevalence = 30%
LR+ = 20;
Post-test probability = 91%
24
Chaining LRs Together
• Example: 45 year-old woman with 1-month
history of intermittent chest pain.
– Pretest probability about 1% for CAD
– History suggestive of angina (substernal pain;
radiating down arm; induced by effort; relieved
by rest…).
• LR of this history for angina is about 100
The previous example:
1. From the
History:
She’s young;
pretest
probability
about 1%
Pretest probability
rises to 50%
based on history
26
Chaining LRs Together
• 45 year-old woman with 1-month history of
intermittent chest pain…
After the history, post test probability is now about
50%. What will you do?
Record an ECG
– Results = 2.2 mm ST-segment depression.
LR for ECG 2.2 mm = 10.
– Overall post test probability is now >90% for
coronary artery disease (see next slide)
27
The previous example: ECG Results
Post-test
probability
now rises
to 90%
Now start pretest
probability
(i.e. prior to ECG)
at 50%, based on
history:
28