Transcript Doyle
From free text
to clinical data
Language and Computing
Davide Zaccagnini, MD
Karen Doyle, RN
October 23, 2007
Outline
• Reality of Applying NLP to AHLTA
documents
• Use Cases
• Ontology-Based NLP
Use Cases
•
PRIMARY Use Case for Health Care Documentation compared with
documentation produced for Biomedical Research
–
•
Collect information to determine diagnosis (ses) and execute a plan of
treatment and communicate with healthcare team.
By-products of Electronic Documentation
– Coding for Billing
– Problem Lists
– Past Medical History
– Social History; 14 Elements tobacco use ETOH, toxin exposure,
marital status
– Family History
– Medications
– Allergies
– Bio-surveillance
– Quality Metrics; Pay for Performance, Joint Commission, HEDIS
– Research
AHLTA offers Structured
Documentation Tool
Medcin Terms in Blue
Structured and Unstructured Text
DoD HA Policy Guidance
Ref ASAD Health Affairs August 7, 2007
Blue is the original code calculated based on the
structured documentation. Pinks are the how
the Doctor can change the subscores,. But the
document does not change.
Background of TATRC HPI
Free Text DUMMY
• Lost Data in S/O sections: What is the value?
• Patient History
–
–
–
–
–
Patient’s “story”, reflects signs and symptoms
History of Present Illness
Review of Systems:
Past Family, Social and Medical History
Used to calculate Evaluation and Management (E&M)
Billing Codes
• HPI: History of Present Illness
– Definition: A chronological description of the present
illness from the first sign or symptom, or from last
encounter
– Comprised of 8 Elements used in the calculation of
E&M code
Location, quality, severity, duration, timing, context,
modifying factors, associated signs and symptoms
(HPI Dummy # 1) Free text
Section Extracted manually
for Analysis
100 Texts for Processing
Free Text to Data: What is
desirable?
• HPI 1 45yo G4P4,
POD14 s/p TAH, doing
well. Denies f/c. Denies
any pain. Not taking any
pain meds. Staples
removed on 9May.
Appetite good. No N/V.
Normal bowel/bladder
function. She is very
happy with the outcome
of surgery. Only concern
is incision -very small
area that has not healed
completely. has been
keeping the incision clean
and dry.
• Expand Abbreviations
• Codify Terms to
Vocabularies ICD 9
SNOMED, MEDCIN
• Negation
• Modality
• Applying Rules
– Financial Billing
– Obtain; age, height,
weight, blood pressure,
dates
– Quality Metrics
– Surveillance
– History, Family, Past
Medical, Current
Problems?
Free Text Example
Expand Abbreviations
Code to Vocabularies
Evaluate for Negation
Apply Rules
negation
appetite good
good
very
f/c
n/v
TAH
pain
happy
taking pain meds
Ontology-based NLP
Natural Language Processing
and Understanding
“…..natural language understanding systems
convert samples of human language into
more formal representations that are
easier for computer programs to
manipulate.”
Wikipedia
Representations
(formal or otherwise)
DATA MODELS
ONTOLOGY
AGREED UPON
TERMS:
FORMALLY DEFINED
OF CONCEPTS:
• PREDEF. USE
• NO PREDEF. USE
• DATA DRIVEN
• REALITY DRIVEN
• PREDEF. CONTEXT
• NO PREDEF. CONTEXT
• SPECIALIZED MODEL
• INFERRED MODEL
What is fever?
All definitions are accurate within their model,
but what is fever?
does the patient have fever?
Formal representations
The world according to a database
Patients {ID#, ZIP code, BP}
ID#
ZIP code
BP
001123
02139
80/120
001223
24425
65/130
The world according to an ontology
patient has (identifier (is_a (ID#)) ∩ lives_in (geographic_area) ∩
has
(blood_pressure (is_measured_by (blood pressure measurement(…)))
identifier
is_a
is_identifed_by
patient
has
blood pressure
is_measured_by
blood pressure
measurement
generates
value
lives in
is_a
80/120
ID#
geographical area
is_identifed_by
ZIP code
65/130
is_a
Ontologies:
the meaning of data
An ontology:
• Explicitly specifies meaning
• Represents reality, not data
• Is a formal schema
• Its consistency can be automatically
enforced and checked
NLP Workflow
• Example Pipeline
Input handler
-> Fetches document and pass to first processing component
Paragrapher
-> Paragraph and title detection
Segmenter
-> Maps tokens and multi-words to ontology. Rewriting to enhance mapping
Section labeler
-> Assigns section labels to paragraphs
Syntactic parser
-> Performs syntactic parsing validating against grammar
Fragment labeler
-> Assigns fragment labels to pieces of text within sections
Lexeme filter
Negation/modality
Semantic tagger
-> Filters out function words (e.g. determiners) to reduce false mapping positives
-> Identifies negation, modality and future
-> Further deduces concepts based on syntax, rewriting, full definitions and so on
Vital signs extractor
-> Extracts vital signs
Labs extractor
-> Extracts lab results
FreePharma
Disambiguator
Coder
Concept filters
Relevance ranker
Output handler
-> Extracts medications
-> Disambiguates concepts
-> Codes to standard classification systems like SNOMED-CT, ICD-9,…
-> Marks concepts that belong to different filters (e.g. diagnoses, procedures)
-> Calculates relevance of concepts
-> Creates XML/HTML/… output
Semantic Tagging
Sample: “Demonstrated benign small polyps in the antrum”
Morphological
Variations:
polyp
<
polyps
antrum
Word Clustering:
antral polyp
Known Synonyms:
maxillary sinus polyp,
;
>
antral
polyp antral
antral polyp
Concept: SNOMED CT : 29074008 : POLYP OF ANTRUM (DISORDER)
Types of Disambiguation
by STRING: lexical match between a term,
(or it’s inflections) and a concept in the
ontology.
symptom
Ex.:
“Patient presents fever”
fever
cough
Types of Disambiguation
by DEFINITION: match between terms and concepts in the ontology,
where these concepts meet necessary and sufficient conditions (logicbased reasoning)
has_location (liver) Λ is_a (biopsy)
organ
true
true
liver
biopsy
Ex.: “Patient underwent a liver biopsy”
=
liver biopsy
procedure
Types of Disambiguation
by RELATIONSHIPS: match between SOME of the term(s), assigned to
different concepts in the ontology, where these concepts compose the full
definition of the concept using a ‘suggested parent’.
CT of Neck
= is_a (CT scan) Λ has_location (neck)
is_a (CT scan) Λ has_location (thyroid)
true
true
true
is_a
CT
thyroid
Ex.: “CT of thyroid”
has_location
neck
=
?
Examples of disambiguation
Ontology and NLP
and data integration
Natural language processing
concepts Terminologies
are
Lexicon
Grammar
mapped to terms in
multiple languages
Proprietary
English
ICD-9
Spanish
LinKBase®
Medical
Ontology
MEDCIN
SNOMED CT
CPT
Radlex (partial)
Cross-mapped to
multiple coding
systems
Conclusion
•
Ontologies are powerful NLP tools for:
• Segmentation
• Disambiguation
• Higher level inference
• Interoperability of extracted data
• Requires human resources for maintenance,
but reduce the need for annotated data
•
They are “white boxes”
• Models that can be expanded and changed
•
Combined with stochastic algorithms, they
provide both formality and scalability
Thank you
NLP/U, formal
representations
“Patients in the North East have higher blood pressure
than the average population”
identifier
is_a
ID#
is_measured_by
has
is_identifed_by
patient
blood pressure
blood pressure
measurement
generates
value
lives in
geographical area
is_a
is_identifed_by
80/120
ZIP code
65/130
is_a
Disambiguation
• Words in document are mapped to
concepts in the ontology
• When more than one candidate exist in
the ontology, it builds a graph of concept
relations using:
1. Nearness in sentence
2. IS_A Relationships
3. Horizontal relationships
Syntactic Parsing
«A very young patient was given a
double dose by his mother.»
Note passive
construction
Negation via Syntax
Modality via Syntax
Reference Resolution
“TeSSI” understands indirect
reference to patient
Disambiguation
The system is able to
disambiguate between two
different
meanings
of
“depressed” in one and the
same sentence. While it
defines the “depressed” in
“depressed patient” as a state
of
mind,
it
recognizes
“depressed” as a part of
“depressed fracture” and tags
this noun phrase with the
corresponding SNOMED code.
Fragment Labeling
• Sentences and phrases are labeled
• History, exam, impression, etc.
• Independent of superficial
formatting
• One label – one type of information
Fragment Labeling
“HPI: The patient whose mother had breast
cancer presents with loss of hearing”
Family
History
Chief
Complaint
FreePharma
. Medication Extraction
• Example
Semantic Indexing
TeSSI : Terminology Supported Semantic
Indexing
Input handler
-> Fetch document and pass to first processing component
Paragrapher
-> Paragraph and title detection
Segmenter
Disambiguator
Relevance ranker
Indexer
-> Map tokens and multi-words to ontology
-> Disambiguate concepts
-> Calculate relevance of concepts
-> Write information to index for quick access.
Information Extraction
Input handler
-> Fetch document and pass to first processing component
Paragrapher
-> Paragraph and title detection
Segmenter
-> Map tokens and multi-words to ontology
Section labeler
-> Assign section labels to paragraphs
Syntactic parser
-> Perform syntactic parsing validating against grammar
Fragment labeler
-> Assign fragment labels to pieces of text within sections
Negation/modality
-> Identify negation, modality and future
Semantic tagger
-> Further deduce concepts based on syntax, rewriting, full definitions and so on
Vital signs extractor
-> Extract vital signs
Labs extractor
-> Extract lab results
FreePharma
Output handler
-> Extract medications
-> Create XML/HTML/… output
Knowledge Discovery
Input handler
-> Fetch document and pass to first processing component
Paragrapher
-> Paragraph and title detection
Segmenter
-> Map tokens and multi-words to ontology
Section labeler
-> Assign section labels to paragraphs
Syntactic parser
-> Perform syntactic parsing validating against grammar
Fragment labeler
-> Assign fragment labels to pieces of text within sections
Negation/modality
-> Identify negation, modality and future
Semantic tagger
-> Further deduce concepts based on syntax, rewriting, full definitions and so on
Vital signs extractor
-> Extract vital signs
Labs extractor
-> Extract lab results
FreePharma
-> Extract medications
Rules Engine
-> Xml structured rules for interpreting syntactic structure and forming semantic
represenations
-> Add discovered knowledge to onology
Ontology writer
Automatic coding
Input handler
-> Fetch document and pass to first processing component
Paragrapher
-> Paragraph and title detection
Segmenter
-> Map tokens and multi-words to ontology
Section labeler
-> Assign section labels to paragraphs
Syntactic parser
-> Perform syntactic parsing validating against grammar
Fragment labeler
-> Assign fragment labels to pieces of text within sections
Negation/modality
-> Identify negation, modality and future
Semantic tagger
-> Further deduce concepts based on syntax, rewriting, full definitions and so on
Vital signs extractor
-> Extract vital signs
Labs extractor
-> Extract lab results
FreePharma
-> Extract medications
Rules Engine
Code Calculator
-> Xml structured rules for interpreting syntactic structure and forming semantic
represenations
-> Code calculator: e&M, ICD-9, CPT
Output handler
-> Create XML/HTML/… output
NLP-based applications
and products
Quality
Projects:
CPR Technologies
JCAHO
Eclipsys
• Extraction of CMS Core Measures
• National Patient Safety Network
• Datawarehousing
44
44
Coding
Projects:
Kaiser Permanente
Convergent Solutions
•
•
•
•
45
E&M Coding
SNOMED Coding
ICD-9 Coding
CPT in development
45
Medication Extraction
Projects:
The Marshfield Clinic
Medquist
UAB
• Medication Reconcilation
• Personalized Medication Project
• Validation of therapies from literature
46
46
Interoperability
Projects:
Integic/DoD
Revolution Health
• Semantic Integration of the military
health systems
• Tie together free text content and
portal applications
47
47
Web Search and Retrieval
Projects:
Revolution Health
Merck
• Ontolgy enhanced search
• Concept based indexing
48
48
Radiology
Projects:
FUJIFILM MEDICAL SYSTEMS
• Findings and pertinent negatives
extracted from radiology reports
49
49
Radiology
• Observation Types
•
•
•
•
Findings
Pertinent Negatives
Quality Assurance
Unclassified
• Observation Components
• Fundamentals
• Modifiers
• Qualifiers
• Observation Status
• (Present) / Historical
• Changed/Not Changed/(not stated)
Observation Types
• Findings
• E.g. “bilateral infiltrates”
• Pertinent Negatives
• E.g. “the lungs are clear”
• Quality Assurance
• E.g. “poor inspiration”
• Unclassified
• E.g. “the lungs are unchanged”
Observation Components
• Fundamentals
•
•
•
•
Pathologic Entities
Physiologic entities
Devices
Procedure
• Modifiers
• Location
• Qualitative
• Quantitative
• Uncertainty (modal)
• Negation
Observation Status
•
•
•
•
•
•
•
Historical
(non-Historical)
Change Stated
No Change Stated
(Change not stated)
Grouped
Contains Uncertain (modal) Element
Example PN and F (Modal)
Example Hx and Grouped
Example CS and NCS
Example Quality
Assurance
Findings
Finding of
PE in
historical
context
Finding of
devices
Modifier in long distance dependency
Pertinent Negatives
A knowledge that
lungs should be clear
negation of
abnormalities
statement of
normality