Cornerstone I: Representing Knowledge

Download Report

Transcript Cornerstone I: Representing Knowledge

Cornerstone I:
Representing Knowledge
From Data to Knowledge Through
Concept-Oriented Terminologies
James J. Cimino
The first step on the path to knowledge
is getting things by their right names.
-Chinese saying
Overview
• What is “data to knowledge”?
• Knowledge representation choices
• Knowledge-based terminology efforts
• Medical Entities Dictionary
• Proof of concepts
What is “data to knowledge”?
• Start with patient data in the medical record
• Enhance knowledge by:
– gaining a better understanding of the patient
– learning relevant knowledge
– bringing smart systems to bear to apply knowledge
– discovering new knowledge from health data
Knowledge Representation
• Terminology for representing symbols
• Format for arranging the symbols
Knowledge Representation Choices
• Guideline implementation
Guideline Implementation
• Starren and Xie, SCAMC, 1994
• National Cholesterol Education Panel Guideline
National Cholesterol Education
Panel Guideline
Measure Cholesterol
& Assess Risk Factors
Cholesterol <200
Cholesterol 200 to 239
HDL >35, <2 Risks
Provide dietary information
Reevaluate in 2 years
Cholesterol >239
HDL <35 or 2 Risks
Guideline Implementation
• Starren and Xie, SCAMC, 1994
• National Cholesterol Education Panel Guideline
• Three representations:
– PROLOG (first-order logic)
NCEP Guideline in PROLOG
rule_j(PID):check_lab(PID,hdl,HDL,_),!,
HDL >= 35,
total_risk(PID,Risk),!,
Risk < 2,
check_lab(PID,cholesterol), C,_),
C >= 200,
C =< 239,
print_rule_j.
Guideline Implementation
• Starren and Xie, SCAMC, 1994
• National Cholesterol Education Panel Guideline
• Three representations:
– PROLOG (first-order logic)
– CLASSIC (frames)
NCEP Guideline in CLASSIC
(CL-DEFINE-CONCEPT ‘C-PATIENT
‘(AND
(ALL CHOL
(AND INTEGER
(MIN 200) (MAX 239)))))
(CL-DEFINE-CONCEPT ‘G-PATIENT
‘(AND C-PATIENT LOW-RISK-PATIENT
(ALL HDL (AND INTEGER (MIN 35)))))
Guideline Implementation
• Starren and Xie, SCAMC, 1994
• National Cholesterol Education Panel Guideline
• Three representations:
– PROLOG (first-order logic)
– CLASSIC (frames)
– CLIPS (production rules)
NCEP Guideline in CLIPS
(defrule C2G2J “Rules to reach box J”
?f1 <- (calculated-patient (state c)
(done no) (hdl ?hdl) (name ?name)
(test (>= ?hdl 35))
=>
(printout “Patient “ ?name “needs treatment”)
Guideline Implementation
• Starren and Xie, SCAMC, 1994
• National Cholesterol Education Panel Guideline
• Three representations:
– PROLOG (first-order logic)
– CLASSIC (frames)
– CLIPS (production rules)
• “All three representations proved adequate for
encoding the guideline”
Knowledge Representation Choices
• Guideline implementation
• Terminologic knowledge
Terminology Representation Choices
• Frame-based
Frame-Based Representation
Serum Glucose Test
is-a:
Lab Test
Measures: Glucose
Specimen: Serum
Units:
“mg/dl”
Terminology Representation Choices
• Frame-based
• Semantic network
Semantic Network Representation
Chemical
is-a
Glucose
Lab Test
is-a
Serum
Glucose
Test
Body Substance
is-a
Serum
Terminology Representation Choices
• Frame-based
• Semantic network
• Conceptual graphs
Conceptual Graph Representation
[Serum Glucose Test] (is-a) -> [Lab Test]
(measures) -> [Glucose]
(specimen) -> [Serum]
Terminology Representation Choices
• Frame-based
• Semantic network
• Conceptual graphs
Knowledge Representation Choices
• Guideline implementation
• Terminologic knowledge
Knowledge Representation
• Terminology for representing symbols
• Format for arranging the symbols
• Terminology and format for representing
terminologic knowledge
Knowledge-Based Terminology Efforts
• Jochen Bernauer, SCAMC, 1991
Jochen Bernauer, SCAMC, 1991
• Conceptual graphs to model findings
increased_uptake
during
bone_phase
site
femur
site_attr
right
Knowledge-Based Terminology Efforts
• Jochen Bernauer, SCAMC, 1991
• Rector, Nolan and Glowinski, SCAMC, 1993
Rector, Nolan and Glowinski,
SCAMC, 1993
•GALEN project
conditions grammatically
haveLocation bodyparts
fractures sensibly haveLocation
bones
femurs sensiblyAndNecessarily
haveDivision neck
Knowledge-Based Terminology Efforts
• Jochen Bernauer, SCAMC, 1991
• Rector, Nolan and Glowinski, SCAMC, 1993
• Campbell and Musen, SCAMC, 1993
Campbell and Musen, SCAMC, 1993
• Conceptual graphs and SNOMED
• Pain + Chest + Radiation to + Left + Arm
[Pain] (located in) -> [Chest]
(radiating to) -> [Arm]
-> (with laterality) -> [Left]
Knowledge-Based Terminology Efforts
•
•
•
•
Jochen Bernauer, SCAMC, 1991
Rector, Nolan and Glowinski, SCAMC, 1993
Campbell and Musen, SCAMC, 1993
Lindberg, Humphreys, McCray, Methods 1993
Lindberg, Humphreys, McCray,
Methods 1993
• Unified Medical Language System
Concept
Lexical group
String
String
Lexical group
String
String
Knowledge-Based Terminology Efforts
•
•
•
•
•
Jochen Bernauer, SCAMC, 1991
Rector, Nolan and Glowinski, SCAMC, 1993
Campbell and Musen, SCAMC, 1993
Lindberg, Humphreys, McCray, Methods 1993
Rocha, Huff, et al., CBM, 1994
Rocha, Huff, et al., CBM, 1994
• VOSER
• A server architecture for managing
terminologic knowledege
Knowledge-Based Terminology Efforts
•
•
•
•
•
•
Jochen Bernauer, SCAMC, 1991
Rector, Nolan and Glowinski, SCAMC, 1993
Campbell and Musen, SCAMC, 1993
Lindberg, Humphreys, McCray, Methods 1993
Rocha, Huff, et al., CBM, 1994
Campbell, Cohn, Chute, et al., SCAMC 1996
Campbell, Cohn, Chute, et al.,
SCAMC 1996
• Convergent Medical Terminology
• SNOMED/Kaiser/Mayo
• Galapagos
Knowledge-Based Terminology Efforts
•
•
•
•
•
•
•
Jochen Bernauer, SCAMC, 1991
Rector, Nolan and Glowinski, SCAMC, 1993
Campbell and Musen, SCAMC, 1993
Lindberg, Humphreys, McCray, Methods 1993
Rocha, Huff, et al., CBM, 1994
Campbell, Cohn, Chute, et al., SCAMC 1996
Brown, O’Neil and Price, Methods, 1997
Brown, O’Neil and Price,
Methods, 1997
• Read Codes
• Representation with GALEN model
Knowledge-Based Terminology Efforts
•
•
•
•
•
•
•
•
Jochen Bernauer, SCAMC, 1991
Rector, Nolan and Glowinski, SCAMC, 1993
Campbell and Musen, SCAMC, 1993
Lindberg, Humphreys, McCray, Methods 1993
Rocha, Huff, et al., CBM, 1994
Campbell, Cohn, Chute, et al., SCAMC 1996
Brown, O’Neil and Price, Methods, 1997
Spackman, Campbell, and Côte, SCAMC 1997
Spackman, Campbell, and Côte,
SCAMC 1997
• SNOMED RT (Reference Terminology)
• Convergent Medical Terminology
• Description Logic Format
Knowledge-Based Terminology Efforts
•
•
•
•
•
•
•
•
•
Jochen Bernauer, SCAMC, 1991
Rector, Nolan and Glowinski, SCAMC, 1993
Campbell and Musen, SCAMC, 1993
Lindberg, Humphreys, McCray, Methods 1993
Rocha, Huff, et al., CBM, 1994
Campbell, Cohn, Chute, et al., SCAMC 1996
Brown, O’Neil and Price, Methods, 1997
Spackman, Campbell, and Côte, SCAMC 1997
Huff, Rocha, McDonald, et al., JAMIA 1998
Huff, Rocha, McDonald, et al.,
JAMIA 1998
• Logical Observations, Identfiers, Names
and Codes (LOINC)
4764-5 | GLUCOSE^3H POST 100 G GLUCOSE PO |
SCNC | PT | SER/PLAS | QN|
Knowledge-Based Terminology Efforts
•
•
•
•
•
•
•
•
•
•
Jochen Bernauer, SCAMC, 1991
Rector, Nolan and Glowinski, SCAMC, 1993
Campbell and Musen, SCAMC, 1993
Lindberg, Humphreys, McCray, Methods 1993
Rocha, Huff, et al., CBM, 1994
Campbell, Cohn, Chute, et al., SCAMC 1996
Brown, O’Neil and Price, Methods, 1997
Spackman, Campbell, and Côte, SCAMC 1997
Huff, Rocha, McDonald, et al., JAMIA 1998
Pharmacy system knowledge base vendors
Pharmacy System
Knowledge Base Vendors
Drug Class
International
Package Identifiers
is-a
Not-Fully-Specified Drug
Ingredient
Class
is-a
Country-Specific
Packaged Product
Clinical Drug
is-a
Ingredient
is-a
is-a
Trademark Drug
is-a
Manufactured
Components
is-a
Composite
Clinical Drug
is-a
Composite
Trademark Drug
Knowledge-Based Terminology Efforts
•
•
•
•
•
•
•
•
•
•
Jochen Bernauer, SCAMC, 1991
Rector, Nolan and Glowinski, SCAMC, 1993
Campbell and Musen, SCAMC, 1993
Lindberg, Humphreys, McCray, Methods 1993
Rocha, Huff, et al., CBM, 1994
Campbell, Cohn, Chute, et al., SCAMC 1996
Brown, O’Neil and Price, Methods, 1997
Spackman, Campbell, and Côte, SCAMC 1997
Huff, Rocha, McDonald, et al., JAMIA 1998
Pharmacy system knowledge base vendors
Medical Entities Dictionary (MED)
•
•
•
•
•
•
•
New York Presbyterian Hospital
60,000 concepts (procs, results, drugs, probs)
208,242 synonyms
84,677 hierarchical links
113,906 semantic links
238,040 other attributes
66,404 translations (ICD9-CM, LOINC, MeSH,
UMLS)
Central Controlled Terminology
MED Data Structures
• Semantic network
MED Semantic Network
Medical
Entity
Substance
Chemical
Laboratory
Specimen
Anatomic
Substance
Plasma
Carbohydrate
Bioactive
Substance
Plasma
Specimen
Event
Diagnostic
Procedure
Laboratory
Test
Plasma
Glucose
Glucose
Laboratory
Procedure
CHEM-7
Part of
MED Data Structures
• Semantic network
• MUMPS global
MED MUMPS Global
^med(1600)
^med(1600,1)
.
.
,4)
.
.
,5)
.
.
,6)
.
.
,7)
.
.
,8)
.
.
,12)
.
.
,14)
.
.
,16)
.
.
,17)
.
.
,20)
.
.
,23)
.
.
,50)
.
.
,138)
.
.
,156)
.
.
,161)
<SERUM GLUCOSE MEASUREMENT>
<C0202041>
<32703,50000>
<>
<Serum Glucose Measurement>
<>
<1724>
<GLUC>
<169>
<31987>
<mg/dl>
<C000006>
<1178>
<Serum Glucose>
<40444,40445,40446,59165>
<MCNC>
<QN>
MED Data Structures
• Semantic network
• MUMPS global
• DB2
MED DB2 Tables
Slots
Entities
1
2
3
4
10
20
30
40
Entity-Slots Entity/Slot/Values
1 10 Entity
1 10
2 10 C0001
2 10
2 40 1234
2 20
2 50 mg/dl
2 30
Name
UMLS
Part-of
Specimen
Ancestry
1
1
1
2
1
2
3
3
MED Data Structures
• Semantic network
• MUMPS global
• DB2
• Unix
MED UNIX Data Structure
1600|SERUM GLUCOSE MEASUREMENT
|1|C020241|4|32703|4|50000|12|GL
UC|17|mg/dl|........
MED Data Structures
• Semantic network
• MUMPS global
• DB2
• UNIX
Proof of Concepts
• Merging data and application knowledge
Merging Data and
Application Knowledge
• Class-based, reusable lab summaries
Lab Display
Lab Test
Intravascular Glucose Test
Fingerstick Glucose Test
Serum Glucose Test
Plasma Glucose Test
Chem20 Display
DOP Summary
WebCIS Summary
Merging Data and
Application Knowledge
• Class-based, reusable lab summaries
Lab Display
Lab Test
Intravascular Glucose Test
Fingerstick Glucose Test
Chem20 Display
Serum Glucose Test
Plasma Glucose Test
• Expert system for application maintenance
Proof of Concepts
• Merging data and application knowledge
• Smarter retrievals from the record
Smarter Retrievals from the Record
• Repository stores events and results
• Clinical problems at a different level of
granularity
• Re-use knowledge to map from
problems to clinical data
• Produce problem-specific views of the
medical record
Concept-oriented
(Heart)
Heart
Disease
Congestive
Heart Failure
Cardiac
Enzyme
Creatine
Kinase
Chest
Chest X ray
Angina
Intravascular CK
Test
Admission :3/14/96
Stroke
Lab :12/28/96
Sickle Cell Test
Admission :2/14/98
Angina
Lab :1/1/99
Blood Type Test
Discharge :1/15/99
CHF
Lab :1/1/99
Cardiac Enzyme Test
Chest X ray 2 View
Radiology :2/28/96
Head CT
Radiology :2/1/97
Knee X Ray
Radiology :2/23/99
Chest X Ray
Proof of Concepts
• Merging data and application knowledge
• Smarter retrievals from the record
• “Just-in-Time” education
“Just-in-time” Education
• Medline button
• Infobuttons
“Just-in-time” Education
• Medline button
• Infobuttons
• Text-to-Web
“Just-in-time” Education
DXplain
Laboratory
Test Results
Medication
Orders
X-ray
Reports
Cholesterol
Guideline
PDR
• Medline button
• Infobuttons
Dietary
Interactions
Micromedex
Medline
• Text-to-Web
Clinical Info
System
Radiol Museum
of South Bank
ICD9
Webpath
CHORUS
Proof of Concepts
•
•
•
•
Merging data and application knowledge
Smarter retrievals from the record
“Just-in-Time” education
Expert systems
Expert Systems
• Hripcsak, et al., Ann. Int. Med., 1995
Hripcsak, et al., Ann. Int. Med., 1995
• Identify chest x-ray reports suspicious for 6
clinical conditions to trigger alerts
Method
Laypersons
Radiologists
Internists
Keyword
NLP/MED/Rule-based
Sens
22-47%
73-98%
68-98%
51-79%
81%
Spec
97-99%
96-99%
97-99%
79-92%
98%
Expert Systems
• Hripcsak, et al., Ann. Int. Med., 1995
• Clinical decision support system
Clinical Decision Support System
• Data monitor runs rules against
incoming reports
• Tuberculosis cultures come back 4-8
weeks later
• One day, hundreds of TB alerts came in
What Happened to the
Tuberculosis Alert?

No Growth
No Growth
to Date
Medical Logic
Module
How We Outsmarted the Lab
“No Growth”
Results
No Growth
No Growth
to Date
No Growth
after 48 Hours

Medical Logic
Module
No Growth
after 24 Hours
No Growth
after 72 Hours
No Growth
after ...
Expert Systems
• Hripcsak, et al., Ann. Int. Med., 1995
• Clinical decision support system
• DXplain Button
DXplain Button
• Elhanan, et al., SCAMC 1997
• Convert of test results to clinical findings
Serum Cholesterol Test
Serum Specimen
Serum
Abnormalities of
Serum Cholesterol
Cholesterol
• Pass findings to DXplain
Hypercholesterolemia
Expert Systems
• Hripcsak, et al., Ann. Int. Med., 1995
• Clinical decision support system
• DXplain Button
Proof of Concepts
•
•
•
•
•
Merging data and application knowledge
Smarter retrievals from the record
“Just-in-Time” education
Expert systems
Data mining
Data Mining
• Wilcox and Hripcsak, SCAMC 1997
Wilcox and Hripcsak, SCAMC 1997
Data Mining
• Wilcox and Hripcsak, SCAMC 1997
• Wilcox and Hripcsak, SCAMC 1998
Wilcox and Hripcsak, SCAMC 1998
• Compare traditional coding methods with
NLP to identify conditions in a set of patient
records (x-ray reports)
Method
Laypersons
Expert-coded cases
ICD-9-coded cases
Physicians
NLP/MED/Rule-based
Sens
36%
27-37%
12-29%
85%
81%
Spec
86%
95-98%
86-90%
98%
98%
Data Mining
• Wilcox and Hripcsak, SCAMC 1997
• Wilcox and Hripcsak, SCAMC 1998
Proof of Concepts
•
•
•
•
•
•
Merging data and application knowledge
Smarter retrievals from the record
“Just-in-Time” education
Expert systems
Data mining
Database maintenance and use
Database Maintenance and Use
• Tables, columns, events all modeled in
the MED
• Allows linkage of data model to
controlled terminology
• Terminologies can be reused
• Impact of terminology changes on data
model can be tracked
Proof of Concepts
•
•
•
•
•
•
•
Merging data and application knowledge
Smarter retrievals from the record
“Just-in-Time” education
Expert systems
Data mining
Database maintenance and use
Terminology maintenance and use
Terminology Maintenance and Use
• Integrating terminologies from merging
hospitals
• Automated update of medication
terminology
• Detection of errors and inconsistencies
Proof of Concepts
•
•
•
•
•
•
•
Merging data and application knowledge
Smarter retrievals from the record
“Just-in-Time” education
Expert systems
Data mining
Database maintenance and use
Terminology maintenance and use
Is it Worth the Trouble?
Meed:
• noun
• 1 archaic : an earned reward or wage
• 2 : a fitting return or recompense
• Date: before 12th century
• Etymology: from Old English:
MED
Summary
• Putting knowledge in your terminology gets you:
– Better ways to get knowledge out of your EMR
– Better ways to get knowledge out of resources
– Better ways to use other knowledge bases
– Bettter ways to use terminology
– Better ways to manage applications
– Better ways to manage data and terminology
• Representation scheme is less important
• Desiderata for controlled terminology
Desiderata
•Desirable qualities for terminology
Desiderata
•Desirable qualities for terminology
“Go placidly amid the noise and haste,
and remember what peace there may
be in silence.”
“I’d rather be sailing”