Transcript Title
Using Patient Data to Retrieve
Health Knowledge
James J. Cimino, Mark Meyer,
Nam-Ju Lee, Suzanne Bakken
Columbia University
AMIA Fall Symposium
October 25, 2005
Automated Retrieval with Clinical Data
4
Resource
Terminology
5
2
Automated
Translation
Get Information
From EMR
6
MRSA
1
Querying
Understand
Information
Needs
3
Resource
Selection
7
Presentation
What’s Hardest about Infobuttons?
• It’s not knowing the questions
• It’s not integrating clinical info systems
• It’s not linking to resources
• It’s translating source data to target terms
Automated Retrieval with Clinical Data
4
Resource
Terminology
5
2
Automated
Translation
Get Information
From EMR
6
MRSA
1
Querying
Understand
Information
Needs
3
Resource
Selection
7
Presentation
What’s Hardest about Infobuttons?
• It’s not knowing the questions
• It’s not integrating clinical info systems
• It’s not linking to resources
• It’s translating source data to target terms
Types of Source Terminologies
• Uncoded (narrative):
– Radiology reports (?)
"…infiltrate is seen in the left
upper lobe."
• Coded
– Lab tests (6,133)
AMIKACIN, PEAK LEVEL
– Sensitivity tests (476)
AMI 6 MCG/ML
– Microbiology results (2,173)
ESCHERECHIA COLI
– Medications (15,311)
UD AMIKACIN 1 GM VIAL
Types of Target Terminologies
• Narrative search:
– PubMed
– RxList
– Up to Date
– Micromedex
– Lab Tests Online
– OneLook
– National Guideline
Clearinghouse
• Coded resource:
– Lexicomp
– CPMC Lab Manual
• Coded search
– PubMed
The Experiments
• Identify sources of patient data
• Get random sample of terms for each source
Term Samples
• 100 terms from radiology reports using MedLEE
• 100 Medication ingredients
• 100 Lab test analytes
• 100 Microbiology results
• 94 Sensitivity test reagents
The Experiments
• Identify sources of patient data
• Get random sample of terms for each source
• Translate terms if needed (multiple methods)
• Perform automated retrieval with terms
Searches Performed
Narrative
Resource
UnRadiology
Coded
Terms
C
o
d
e
d
Concept Concept
Resource Search
PubMed, NGC,
OneLook, UptoDate
Medications
RxList, Micromedex
Lab
Tests
Sensitivity
Tests
Microbiology
Results
LabtestsOnline,
PubMed
Lexicomp
CPMC Lab
Manual
PubMed
RxList, Micromedex
UptoDate, PubMed
PubMed
Mapping Methods
• Microbiology results to MeSH:
– Semi-automated
• Lab tests to MeSH analytes:
– Automated, using UMLS
• Medications to Lexicomp:
– Natural language processing
• Lab tests to CPMC Lab Manual:
– Manual matching
Results: Multiple Documents
Terms from Data Source
100 Findings
and Diagnoses
from 20
Radiology Reports
100 Microbiology
Result Terms
100 Lab Test Terms
(using analyte names)
Searches Performed
100 PubMed
100 Up to Date
100 NGC
100 One Look
100 Up to Date
100 PubMed
100 PubMed
(using MeSH translation)
100 Lab Tests Online
100 PubMed
100 PubMed (using
MeSH translation)
Retrieval Success
100 % (92,440)
82% (28.6)
95% (119)
81% (25.8)
94% (1.4)
100% (3,328)
100% (18,036)
73% (133)
99% (84,633)
100% (90,656)
Retrieval success is represented as percent of terms that successfully retrieved
any results; numbers in parentheses indicate average numbers of results
(citations, documents, topics, definitions, etc., depending on the target
resource) for those searches that retrieved at least one result.
Uncoded versus Coded Searches
• 1,028/2,173 (47.3%) of microbiology tests terms
mapped to MeSH
Result Type
Identical
Number
33
Ratio
1.00
Slight Diff
Large Diff
7
60
1.44
29.92
• 940/1041 (90.3%) of lab analytes mapped to LOINC
• 485/940 (51.6%) LOINC analytes mapped to MeSH
Result Type
Number
Ratio
Identical
72
1.00
Slight Diff
Large Diff
16
12
1.05
3.28
Results: Single Document
Terms from Data Source
Searches Performed
100 Medication Terms
100 Lexicomp (using
document identifiers)
100 Lab Manual (using
document identifiers)
100 Laboratory Test
Terms
Retrieval
Success
96% (1)
94% (1)
Retrieval success is represented as percent of terms that successfully
retrieved any results; numbers in parentheses indicate average numbers of
results (citations, documents, topics, definitions, etc., depending on the
target resource) for those searches that retrieved at least one result.
Results: Page of Links
Terms from Data Source
100 Medication Terms
(using ingredient names)
94 Sensitivity Test Terms
(using antibiotic names)
Searches Performed
100 Rx List
100 Micromedex
94 Rx List
94 Micromedex
Retrieval Success
95% [.88/.04]
100% [.89/.06]
85%[.79/.06]
97% [.96/.01]
Results for Rx List and Micromedex are difficult to quantify, because
they provided heterogeneous lists of links; rather than provide link
counts, we assessed the true positive and false negative rates, shown
in brackets.
Micromedex versus RXList
194 Terms
9 missed by both
Micromedex: 180
22 found by Micromedex
but missed by RxList
RxList: 163
158 Terms
found
by both
5 found by RxList
but missed by
Micromedex
See For Yourself!
www.dbmi.columbia.edu/cimino/2005amia-data.html
Discussion
• 7 sources, 894 terms, 11 resources, 1,592 searches
• Automated retrieval is technically possible
– Found something 73-100% of the time
– 12/16 experiments “succeeded” 94-100%
• Translation often unsuccessful
• Automated indexing works
• Usefulness of translation to MeSH is marginal
• Good quality when retrieving pages of links
(Micromedex and RxList)
• Good quality when with concept-indexed resources
• Recall/precision of document retrievals unknown
– Need to define the question
– Additional evaluation needed
Next Steps
• Creation of terminology management
and indexing suite
• Formal analysis of qualities of answers
Acknowledgments
This work is supported in part by NLM
grants R01LM07593 and R01LM07659
and NLM Training Grants LM07079-11
and P20NR007799.