Diagnostic Testing

Download Report

Transcript Diagnostic Testing

October 29, 2008
Session 4:
Assessing a Document on Diagnosis
Peter Tarczy-Hornoch MD
Head and Professor, Division of BHI
Professor, Division of Neonatology
Adjunct Professor, Computer Science and Engineering
faculty.washington.edu/pth
Using Questionmark Software
 See e-mail “[MIDM] Important - testing your
Questionmark login id/web browser before MIDM
final exam” for more details
 First: get your login id/password from MyGrade
 Second: test your login id/password and your
computer’s/browser’s ability to save and retrieve
you exam:

https://primula.dme.washington.edu/q4/perception.dll
 2 question “Test” exam up until 5P Monday 11/3
Assessing a Document
on Diagnosis
 Context for Assessing a Diagnosis Document
 Diagnosis Statistics
 Applying to a Scenario
Diagnostic vs. Therapeutic Studies
Diagnostic Testing
(What is it?)
(Session 4)
Patient Data &
Information
Therapy/Treatment
(What do I do for it?)
(Session 3)
Case specific
decision making
General Information & Knowledge
Steps to Finding & Assessing Information
1. Translate your clinical situation into a formal
framework for a searchable question (Session 1)
2. Choose source(s) to search (Session 2)
3. Search your source(s) (Session 2)
4. Assess the resulting articles (documents)



Therapy documents (Session 3)
Diagnosis documents (Session 4)
Systematic reviews/comparing documents (Session 5)
5. Decide if you have enough information to make a
decision, repeat 1-4 as needed (ICM, clinical rotations,
internship, residency)
PubMed – Finding a Diagnostic Article
Assessing a Document
Search
Result
Comparison
Problem at hand
Problem studied
Are they really the same?
Patient characteristics
Population characteristics
Intervention most relevant
to patient/provider
Comparison – other
alternatives considered
Outcomes – those
important to pat/prov
Intervention studied
(primary one)
Comparison – alternatives
studied
Outcomes – those looked
at by study
Number of subjects
Is patient similar enough to
population studied?
Are they the same?
Study design hoped for
Statistics – study design
and statistical results
Sponsor – who paid for
study
Are alternatives studied
those of interest to you?
Are outcomes studied
those of interest to you?
Does study have enough
subjects to trust results?
Is study design good?
What do results mean?
Is there potential bias?
Assessing a Document
on Diagnosis
 Context for Assessing a Diagnosis Document
 Diagnosis Statistics
 Applying to a Scenario
Many Different Kinds of Tests
 Tests predict presence of disease
 Types of tests

Screening test: before symptoms appear look for disease


Diagnostic test: given symptoms/suggestion of a disease help
rule in (confirm) or rule out (reject) a diagnosis


Example: Screening mammograms
Example: ultrasound of appendix in face of abdominal pain
Gold Standard: a “perfect” test that “definitively” categorizes a
patient as having one disease

Example: Surgery to remove appendix and then pathologic exam
 Can’t always use Gold Standard => Use diagnostic tests

E.g. high risk/cost, only rules in/out one disease vs. multiple, etc.
The 2x2 Table:
Diagnostic Test vs. Gold Standard
Disease
Present (+)
“Diagnostic
Test”
Test Positive
(+)
Test
Negative (-)
TP: True
Positive
FN: False
Negative
Disease
Absent (-)
FP: False
Positive
TN: True
Negative
“Gold
Standard
Test”
 Non-intuitive labels:



Disease Present = Disease “Positive” (+) = Dz(+)
Test Positive = Test predicting disease present
From patient/provider point of view neither Disease
Positive nor Test Positive (+) are good things!
Sensitivity (Sn)
Disease
Present (+)
“Diagnostic
Test”
Test Positive
(+)
Test
Negative (-)
TP: True
Positive
FN: False
Negative
Disease
Absent (-)
FP: False
Positive
TN: True
Negative
“Gold
Standard
Test”
 Sensitivity is proportion of all people with disease who
have a positive test
 Sensitivity =TP/(TP+FN)
 SnNOut - sensitive test, if negative, rules out disease
 Sensitivity useful to pick a test – sensitivity key for
screening test
Specificity (Sp)
Disease
Present (+)
“Diagnostic
Test”
Test Positive
(+)
Test
Negative (-)
TP: True
Positive
FN: False
Negative
Disease
Absent (-)
FP: False
Positive
TN: True
Negative
“Gold
Standard
Test”
 Specificity is proportion of all people without disease who
have negative test
 Specificity = TN/(FP+TN)
 SpPIn – A specific test, if positive, rules in disease
 Specificity useful to pick a test – specificity key for
diagnostic test
“Cut Off Values” Impact Sn/Sp
Example: blood sugar to predict diabetes
Blood Sugar
Sensitivity
Specificity
70
98.6%
8.8%
100
88.6%
69.8%
130
64.3%
96.9%
160
47.1%
99.8%
200
27.1%
100%
Sensitivity key for
screening test
Specificity key for
diagnostic test
Positive Predictive Value (PPV)
Disease
Present (+)
“Diagnostic
Test”
Test Positive
(+)
Test
Negative (-)
TP: True
Positive
FN: False
Negative
Disease
Absent (-)
FP: False
Positive
TN: True
Negative
“Gold
Standard
Test”
 PPV is proportion of all people with a positive test who
have a disease
 PPV=TP/(TP+FP)
 PPV is useful to use a test: if you have a positive result for
your patient, what % of people with positive results
actually have the disease
Negative Predictive Value (NPV)
Disease
Present (+)
“Diagnostic
Test”
Test Positive
(+)
Test
Negative (-)
TP: True
Positive
FN: False
Negative
Disease
Absent (-)
FP: False
Positive
TN: True
Negative
“Gold
Standard
Test”
 NPV is proportion of all people with a negative test who
don’t have a disease
 NPV=TN/(FN+TN)
 NPV is useful to use a test: if you have a negative result
for your patient, what % of people with negative results
actually don’t have the disease
Prevalence, pre-test & post-test probabilities
 Prevalence:


total cases of disease in the population at given time
2x2 table: [disease (+)])/[disease (+) + disease (-)]
 Pre-test probability:



Estimate of probability/likelihood your patient has a
disease before you order your test
Often an estimation based on experience or prevalence
Screening test: pre-test probability = prevalence
 Post-test probability:

The probability/likelihood that your patient has a
disease, after you get the results of the test back
PPV/NPV Dependence on Disease Prevalence
PPV Example
Prevalence = 50%
Prevalence = 5%
Sn=TP/(TP+FN)=
450/(450+50)=90%
Sp=TN/(FP+TN)=
480/(20+480)=96%
PPV=TP/(TP+FP)=
450/(450+20)=95.7%
of those with T(+) have Dz(+)
Sn=TP/(TP+FN)=
45/(45+5)=90%
Sp=TN/(FP+TN)=
912/(912+38)=96%
PPV=TP/(TP+FP)=
Dz (+)
Test(+)
450 TP
45/(45+38)=54.2%
of those with T(+) have Dz(+)
Dz (-)
20 FP
Test(-)
Dz (+) Dz (-)
Test(+)
45
TP
38 FP
5
FN
912TN
Test(-)
50
FN
500
480TN
500
Prevalence = 50%
50
950
Prevalence = 5%
Pros/Cons Sn/Sp/PPV/NPV
 Relative Pros:


Note: Sn/Sp/PPV/NPV on boards
PPV/NPV useful for diagnosis - probability of disease
after (+ ) or (–) test
Sn/Sp useful for choosing a test (screening/diagnosis)
 Relative Cons:



PPV/NPV vary with prevalence of disease
Prevalence of disease in general population may not be
the same as that of patients you see in clinic/ER
Your estimation of probability of disease (pre-test
probability) may not match prevalence in a population
 Current tendency therefore => use likelihood ratios
Bayes Theorem
 Note: this slide is here for completeness, likelihood
ratios better, this slide is thus not on the exam
 Bayes Theorem
How to update or revise beliefs in light of new evidence
 http://plato.stanford.edu/entries/bayes-theorem/

 Related to Bayes is an alternate form of PPV/NPV
as f(Sn, Sp, pre-test) that “pulls out” pre-test
probability or prevalence



P(Dz)=probability of disease (e.g. prevalence, pre-test)
PPV=Sn*P(Dz)/[Sn*P(Dz) + (1-Sp)*(1-P(Dz))]
NPV=Sp*(1-P(Dz))/[Sp*(1-P(Dz)) + (1-Sn)*(P(Dz))]
Likelihood Ratios
 Likelihood Ratio does NOT vary with prevalence
 Likelihood Ratio (LR)


LR+ = Sn/(1-Sp) – likelihood ratio for a positive test
LR- = (1-Sn)/Sp – likelihood ratio for a negative test
 Applying LR given a pre-test disease probability:




Pre=Pre-test probability (can be prevalence)
Post=Post-test probability
Post=Pre/(Pre+(1-Pre)/LR)
Same as Bayes & PPV/NPV but cleanly separates test
characteristics (LR) from disease prevalence/pre-test
probabilities
Interpreting Likelihood Ratios (I)
 LR=1.0

Post-test probability = the pre-test probability (useless)
 LR >1.0
Post-test probability > pre-test probability (helps rule in)
 Test result increases the probability of having the disorder

 LR <1.0
Post-test probability < pre-test probability (helps rules out)
 Test result decreases the probability of having the disorder

 LR+ (Likelihood Ratio for a Positive Test) vs. LR(Likelihood Ratio for a Negative Test) => See
Appendicitis Slide
Interpreting Likelihood Ratios (II)
 Likelihood ratios >10 or <0.1


Test generates large changes in pre- to post-test probability
Test provides strong evidence to rule in/rule out a diagnosis
 Likelihood ratios of 5-10 and 0.1-0.2


Test generates moderate changes in pre- to post-test probability
Test provides moderate evidence to rule in/rule out a diagnosis
 Likelihood ratios of 2-5 and 0.2-0.5


Test generates small changes in pre- to post-test probability
Test provides minimal evidence to rule in/rule out a diagnosis
 Likelihood ratios 0.5-2


Test generates almost no changes in pre- to post-test probability
Test provides almost no evidence to rule in/rule out a diagnosis
Interpreting
Likelihood
Ratios (III)
 From slide on impact of
prevalence:



Sn=90%, Sp=96%
LR+ =0.90/(1-0.96)=22.5
Post=Pre/(Pre+(1-Pre)/LR)
 If prevalence (pre-test) is
50% => post-test 95.7%
 If prevalence (pre-test) is
5% => post-test 54.2%
LR Nomogram =>
Likelihood Ratios for
Physical Exam for Appendicitis
Present=“Moderate evidence” for appendicitis
Present=“Moderate evidence” for appendicitis
BUT 95% CI of LR includes <2 thus includes
“Minimal Evidence”
LR+
LR-
Present=“Almost no evidence” for appendicitis
Assessing a Document
on Diagnosis
 Context for Assessing a Diagnosis Document
 Diagnosis Statistics
 Applying to a Scenario
Learning to Diagnose Pneumonia
 Medical School


Preclinical: anatomy, histology, pathology,
microbiology, pharmacology, physiology,…
Clinical: medicine, pediatrics, family medicine,
surgery,….
 Residency: Outpatient, inpatient, specialty
rotations, general rotations, emergency room,….
 Fellowship: More of the same
 Result: a number of items on history, physical
exam, laboratory studies that suggest pneumonia
with chest X-ray as gold standard
Literature on Diagnosis of Pneumonia
 Clinical query for “pneumonia” “diagnosis” (1478)
 Change to “community acquired pneumonia” (181)
 Add in “likelihood ratio” (27)
 Find “Derivation of a triage algorithm for chest
radiography of community-acquired pneumonia
patients in the emergency department.” Acad
Emerg Med. 2008 Jan;15(1):40-4.
Paper: Background/Objectives
 BACKGROUND: Community-acquired
pneumonia (CAP) accounts for 1.5 million
emergency department (ED) patient visits in the
United States each year.
 OBJECTIVES: To derive an algorithm for the ED
triage setting that facilitates rapid and accurate
ordering of chest radiography (CXR) for CAP.
Paper: Methods
 METHODS: The authors conducted an ED-based
retrospective matched case-control study using 100
radiographic confirmed CAP cases and 100
radiographic confirmed influenzalike illness (ILI)
controls. Sensitivities and specificities of
characteristics assessed in the triage setting were
measured to discriminate CAP from ILI. The
authors then used classification tree analysis to
derive an algorithm that maximizes sensitivity and
specificity for detecting patients with CAP in the
ED triage setting.
Paper: Results (I)
 RESULTS: Temperature greater than 100.4
degrees F (likelihood ratio = 4.39, 95% confidence
interval [CI] = 2.04 to 9.45), heart rate greater than
110 beats/minute (likelihood ratio = 3.59, 95% CI
= 1.82 to 7.10), and pulse oximetry less than 96%
(likelihood ratio = 2.36, 95% CI = 1.32 to 4.20)
were the strongest predictors of CAP. However, no
single characteristic was adequately sensitive and
specific to accurately discriminate CAP from ILI.
 Evidence:


LR>10: Strong, LR 5-10 moderate
2-5 minimal, 1-2 scant evidence
Paper: Results (II)
 RESULTS (continued): A three-step algorithm
(using optimum cut points for elevated
temperature, tachycardia, and hypoxemia on room
air pulse oximetry) was derived that is 70.8%
sensitive (95% CI = 60.7% to 79.7%) and 79.1%
specific (95% CI = 69.3% to 86.9%).
 LR+ =Sn/(1-Sp)=0.708/(1-0.791)=3.39 (minimal)
 LR- =(1-Sn)/Sp=(1-0.708)/0.791= 0.37 (minimal)
 Post=Pre/(Pre+(1-Pre)/LR)
 Post if Pre 1/10 = 0.1/(0.1+(1-0.1))/3.39)=0.27
 Post if Pre 1/2 = 0.5/(0.5+(1-0.5))/3.39)=0.77
Paper: Conclusions
 CONCLUSIONS: No single characteristic
adequately discriminates CAP from ILI, but a
derived clinical algorithm may detect most
radiographic confirmed CAP patients in the triage
setting. Prospective assessment of this algorithm
will be needed to determine its effects on the care
of ED patients with suspected pneumonia.
 Note: all of these characteristics are among the
tried and true findings taught in medical school,
residency, fellowship but typically not
quantitatively taught
Small Group Monday November 3rd



Students to complete assignment for Small
Group Session #5 by Mon 11/3 2-2:50
Small group leads to give examples of recent
clinical situations where they had to evaluate one
or more documents related to making a diagnosis
Group to review and discuss from assignment
short examples related to diagnosis focusing on:



Sensitivity, Specificity, Positive Predictive Value (PPV),
Negative Predictive Value (NPV)
Likelihood Ratios (Positive/Negative): LR+, LR-
AND/OR: group to search for treatment article(s)
on a topic of interest and assess results
QUESTIONS?
 Context for Assessing a Diagnosis Document
 Diagnosis Statistics
 Applying to a Scenario
 Small Group Portion