Natural Language Processing for Biosurveillance

Download Report

Transcript Natural Language Processing for Biosurveillance

Development of ConText Tools in Python
Brian E. Chapman, PhD, Glenn
Dayton, Wendy W. Chapman, PhD
Division of Biomedical Informatics
Caveats and Apologies
• I’m not a linguist, computational or otherwise,
nor am I a grammarian, or multilingual, or
particularly well spoken in my native English
• In fact I’m a medical physicist who has drifted
into imaging informatics with an emphasis on
the image part of imaging informatics
• Which is a long way of saying that I got into
this field because of a specific problem
Motivation
• Received NIH funding for computer-aided
detection of pulmonary embolism in CT
pulmonary angiography (CTPA)
• How to identify appropriate cases from
clinical PACS?
Case Identification Approach #1
• Talk to an honest broker
– Who was obviously overworked
– Who used procedure codes from RIS to
identify potential cases
– Who then read the dictated report
– Who then classified the case
– Who nearly fainted when I told her I needed
hundreds of positive cases
– Who then quickly asked, “Do you have a lot of
money?”
Case Identification Approach #2
• Honest broker’s task is perfect for NegEx
– Use procedure codes to identify reports in
MARS repository at University of Pittsburgh
– Use NegEx to classify reports as +/- for PE
– Within minutes find hundreds of cases
– Very happy honest broker
What if you wanted to answer
more questions?
•
•
•
•
Disease uncertainty
Disease temporality
Image quality
Can we a priori specify all of these?
peFinder
• Application to characterize CTPA reports
– Presence or absence of PE
– Temporal state of positive PE
– Uncertainty of disease state
– Technical quality of the exam
For Review: NegEx
scope
Clinical condition:
Negation:
Cough
Negated
Patient denies cough but complains of headache.
No change in the patient’s chest pain.
trigger term
pseudo-trigger
term
termination
term
Python Implementations
• What Drove My Organic Design
– What existed in NegEx
• GUI program written in Tcl/Tk
• Lots of enumerated trigger terms
– What I wanted
• I wanted a package that could be used to build a
variety of accurate applications
• I wanted it to be easy for others to use
• I am an engineer and so lazy
– Generalize relationships
– Replace exhaustive enumeration of trigger terms with
regular expressions
pyConText: Basic Framework
• Item Objects: 4-tuple containing Lexical
and Domain Knowledge
– Literal (label): “pulmonary embolism”
– Category/Concept
– Regular expression
• r‘’‘(pulmonary )(artery )?(embol[a-z]+)’’‘
– Rule
• Directional influence of item in sentence
• Category interaction?
pyConText: Basic Framework
• Item Objects parse sentence to create Tag
Objects within sentences
– Tag Objects interact/modify each other
• Targets
• Modifiers
• Conjunctions
• Prune to eliminate subset tag objects
• Directional Graph represents relationships
Did I Meet My Objectives?
• Accurate
– Yes: JBI 2011
• Modular
– Yes: package in pypi
• Easy for others to use
– Depends on your definition of others
– Wilson, et al. Journal of Pathology Informatics
– Gentili and Chapman RSNA
Did I Meet My Objectives?
• Easy for others to use (continued)
– Can any application relying on user to provide
regular expressions be defined as easy?
Current and Future Work
• Web and GUI applications
– Django
– Django with Twisted for desktop port
Current and Future Work
• Improved Knowledge Representation
– Separating linguistic and domain knowledge
– Integration with external knowledge bases
• Use graphs to further reduce enumeration
of items
– No/definite/evidence of/pulmonary embolism
• Thanks for the invitation
• Looking forward to
– Learning and
– Working and
– Skateboarding
• For the next three weeks