TESTING A TEST

Download Report

Transcript TESTING A TEST

Interpreting Diagnostic Tests
Ian McDowell
Department of Epidemiology &
Community Medicine
January 2010
Note to users: you may find the additional notes
& explanations in the ppt notes panel helpful.
1
Objectives
• To understand sources of error in typical
measurements
• To understand sensitivity, specificity
• To explain the implications of false
positives and false negatives
• To understand predictive values,
• And Likelihood ratios
2
Road map to date
This session considers the interpretation of diagnostic tests,
a daily issue in clinical practice.
It builds on some of the ideas
introduced last term:
Applying conclusions from
a study sample
to an individual patient
Evidence-based practice
Measurements:
validity, bias
determinants of bias
Contrasts between research
on hospital patients
and community practice
3
The Challenge of Clinical
Measurement
• Diagnoses are based on information, from formal
measurements and/or from your clinical judgment.
• This information is seldom perfectly accurate:
– Random errors can occur (machine not working?)
– Biases in judgment or measurement can occur (“this kid
doesn’t look sick”)
– Due to biological variability, this patient may not fit the
general rule
– Diagnosis (e.g., hypertension) involves a categorical
judgment; this often requires dividing a continuous score
(blood pressure) into categories. Choosing the cutting-point
is challenging.
4
Therefore…
• You need to be aware …
– Diagnostic judgments are based on probabilities;
– That using a quantitative approach is better than
just guessing!
– That you will gradually become familiar with the
typical accuracy of measurements in your chosen
clinical field;
– That the principles apply to both diagnostic and
screening tests;
– Of some of the ways to describe the accuracy of a
measurement.
5
Why choose one test and not another?
• Reliability: consistency or reproducibility;
this considers chance or random errors (which
sometimes increase, sometimes decrease, scores).
“Is it measuring something?”
• Validity: “Is it measuring what it is supposed to
measure?” By extension, “what diagnostic
conclusion can I draw from a particular score on this
test?”
Validity may be affected by bias, which refers to
systematic errors (these fall in a certain direction)
• Safety, Acceptability, Cost, etc.
6
Reliability and Validity
Reliability
Low
Validity
Low
High
••
••••
•
• •
Biased
result!
• •
•
•
High •
•
Average of these
inaccurate results is not bad.
This is probably how
screening questionnaires (e.g.,
for depression) work
•
•
••
••
••
☺
•
7
Ways of Assessing Validity
• Content or “Face” validity: does it make
clinical or biological sense? Does it include
the relevant symptoms?
• Criterion: comparison to a “gold standard”
definitive measure (e.g., biopsy, autopsy)
– Expressed as sensitivity and specificity
• Construct validity (this is used with
abstract themes, such as “quality of life” for
which there is no definitive standard)
8
Criterion validation: “Gold Standard”
The criterion that your clinical observation or simple
test is judged against:
– more definitive (but expensive or invasive) tests, such
as a complete work-up, or
– the clinical outcome (for screening tests, when workup
of well patients is unethical).
Sensitivity and specificity are calculated
from a research study comparing the
test to a gold standard.
9
“2 x 2” table for validating a test
Test score:
Test positive
Test negative
Validity:
Gold standard
Disease Disease
Present Absent
a (TP)
b (FP)
c (FN)
d (TN)
Sensitivity Specificity
= a/(a+c) = d/(b+d)
= TP/Diseased = TN/Healthy
TP = true positive; FP = false positive…
Golden Rule: always calculate based on the gold standard
10
A Bit More on Sensitivity
= Test’s ability to detect disease when it is present
a/(a+c) = TP/(TP+FN) = TP/disease
Mnemonics:
- a sensitive person is one who is aware of your
feelings
- (1 – seNsitivity) = false Negative rate
= how many cases are missed by the screening
test?
11
…and More on Specificity
Precision of the test
– a specific test would identify only that type of
disease. “Nothing else looks like this”
– a highly specific test generates few false
positives. So,
– If the result is positive, you can be confident the
patient has this diagnosis.
• Mnemonics: (1- sPecificity) = false Positive
rate (How many are falsely classified as
having the disease?)
12
Problems Resulting from Test Errors
• False Positives can arise due to other factors (such
as taking other medications, diet, etc.) They entail
the cost and danger of further investigations,
labeling, worry for the patient.
– This is similar to Type I or alpha error in a test of
statistical significance (the possibility of falsely
concluding that there is an effect of an intervention).
• False Negatives imply missed cases, so potentially
bad outcomes if untreated
– Cf. Type II or beta error: the chance of missing a true
difference
13
Most Tests Provide a Continuous Score.
Selecting a Cutting Point
Test scores for a
healthy population
Sick population
Healthy
scores
Pathological
Possible cut-point
scores
Move this way to
Move this way to
increase sensitivity
increase specificity
(include more of
(exclude healthy people)
sick group)
Crucial issue: changing cut-point can improve 14
sensitivity or specificity, but never both
Clinical applications
• A specific test can be useful to
rule in a disease. Why?
D+ DT+ a b
T- c d
– Very specific tests give few false positives.
So, if the result is positive, you can be sure the
patient has the condition (‘nothing else would
give this result’): “SpPin”
• A sensitive test can be useful for ruling
a disease out:
–A negative result on a very sensitive test
(which detects all true cases) reassures you that
the patient does not have the disease: “SnNout”
16
Your Patient’s Question:
“Doctor, how likely am I to have this disease?”
This introduces Predictive Values
• Sensitivity & specificity don’t answer this,
because they work from the gold standard.
• Now you need to work from the test result,
but you won’t know whether this person is a
true positive or a false positive (or a true or
false negative). Hmmm…
How accurately does a positive (or negative)
result predict disease (or health)?
17
Start from Prevalence
• Before you do any test, the best guide you
have to a diagnosis is based on prevalence:
– Common conditions (in this population) are the
more likely diagnosis
• Prevalence indicates the ‘pre-test
probability of disease’
18
2 x 2 table: Prevalence
Test positive
Test negative
Total
Disease
present
a
c
a+c
Prevalence =
Disease
absent
b
d
b+d
Total
a+b
c+d
N
a+c / N
19
Positive and Negative Predictive Values
• Based on rows, not columns
• Positive Predictive Value (PPV) = a/(a+b)
= Probability that a positive score is a true positive
• NPV = d/(c+d); same for a negative test result
D+ D–
b
T+ a
d
T– c
• BUT… there’s a big catch:
• We are now working across the columns, so PPV & NPV
depend on how many cases of disease there are (prevalence).
• As prevalence goes down, PPV goes down (it’s harder to find
the smaller number of cases) and NPV rises.
• So, PPV and NPV must be determined for each clinical
setting,
• But they are immediately useful to clinician: they reflect this
population, so tell us about this patient
Prevalence and Predictive Values
B. Primary care
A. Specialist referral hospital
D+
D-
T+
50
10
T-
5
100
D+
D-
T+
50
100
T-
5
1000
Sensitivity = 50/55 = 91%
Specificity = 100/110 = 91%
Sensitivity = 50/55 = 91%
Specificity = 1000/1100 = 91%
Prevalence = 55/165 = 33%
Prevalence = 55/1155 = 3%
PPV = 50/60 = 83%
NPV = 100/105 = 95%
PPV = 50/150 = 33%
NPV = 1000/1005 = 99.5%
21
Predictive Values
• High specificity = few FPs: Sp = TN/(TN+FP);
FPs also drive PPV: PPV = TP/(TP + FP);
So, the clinician is more certain that a patient with
a positive test has the disease (it rules in the
disease)
• The higher the sensitivity, the higher the NPV:
Sn = TP/(TP+FN); NPV = TN/(TN+FN);
the clinician can be more confident that a patient
with a negative score does not have the diagnosis
(because there are few false negatives). So, high
NPV can rule out a disease.
22
From the literature you can get
Sensitivity & Specificity.
To work out PPV and NPV for your practice, you need to
guess prevalence, then work backwards:
Fill cells in following order:
“Truth”
Disease
Disease
Present
Absent
Test Pos
Test Neg
Total
Total Predictive
Values
4th
5th
7th
6th
8th
9th
2nd
3rd
1st
(from estimated prevalence)
(from sensitivity) (from specificity)
10th
11th
23
Gasp…!
Isn’t there an easier way to do all this…?
Yes (good!)
But first, you need a couple more concepts (less good…)
• We said that before you apply a test, prevalence
gives your best guess about the chances that this
patient has the disease.
• This is known as “Pretest Probability of Disease”:
(a+c) / N in the 2 x 2 table:
a
b
• It can also be expressed as odds of
c
d
disease: (a+c) / (b+d), as long as
N
the disease is rare
24
This Leads to … Likelihood Ratios
• Defined as the odds that a given level of a
diagnostic test result would be expected in a patient
with the disease, as opposed to a patient without:
true positive rate / false positive rate [TP / FP]
• Advantages:
– Combines sensitivity and specificity into one number
– Can be calculated for many levels of the test
– Can be turned into predictive values
• LR for positive test = Sensitivity / (1-Specificity)
• LR for negative test = (1-Sensitivity) / Specificity
25
Practical application: a Nomogram
1) You need the LR for this test
2) Plot the likelihood ratio on
center axis (e.g., LR+ = 20)
▪
▪
3) Select pretest probability
(prevalence) on left axis
(e.g. Prevalence = 30%)
4) Draw line through these
points to right axis to indicate
post-test probability of
disease
Example:
Post-test probability = 91% 26
There is another way to combine sensitivity and specificity:
Meet Receiver Operating Characteristic (ROC) curves
Work out Sen and Spec for every possible cut-point, then plot these.
Area under the curve indicates the information provided by the test
1
Sensitivity
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1-Specificity ( = false positives)
1
In an ideal test, the
blue line would
reach the top left
corner.
For a useless test
it would lie along
the diagonal: no
better than guessing
27
Chaining LRs Together (1)
• Example: 45 year-old woman presents with
“chest pain”
– Based on her age, pretest probability that a
vague chest pain indicates CAD is about 1%
• Take a fuller history. She reports a 1-month
history of intermittent chest pain, suggesting
angina (substernal pain; radiating down arm;
induced by effort; relieved by rest…)
– LR of this history for angina is about 100
The previous example:
1. From the
History:
She’s young;
pretest
probability
about 1%
LR 100
Pretest probability
rises to 50%
based on history
29
Chaining LRs Together (2)
45 year-old woman with 1-month history of
intermittent chest pain…
After the history, post test probability is now about
50%. What will you do?
A more precise (but also more costly) test:
• Record an ECG
– Results = 2.2 mm ST-segment depression.
LR for ECG 2.2 mm result = 10.
– This raises post test probability to > 90% for
coronary artery disease (see next slide)
30
The previous example: ECG Results
Post-test
probability
now rises
to 90%
Now start pretest
probability
(i.e. 50%, prior to
ECG, based on
history)
31