Natural Language Processing for Biosurveillance

Download Report

Transcript Natural Language Processing for Biosurveillance

Improving Access to Clinical Data
Locked in Narrative Reports:
An Informatics Approach
Wendy W. Chapman, PhD
Division of Biomedical Informatics
University of California, San Diego
Overview
• The promise of natural language processing (NLP)
• Challenges of developing NLP in the clinical domain
• Challenges in applying NLP in the clinical domain
• Improving access to text through NLP resources
The promise of NLP
• Vast & growing amounts of clinical text
• Rich in information
– Patient care
– Evaluation/QC
– Comparative effectiveness research
– Epidemiology
• Locked in free text
• Natural language promising can help unlock that
information
• Encouraging NLP success stories
The promise of NLP
Murff (2011)
JAMA
NLP captures:
• Renal failure
• Pulmonary
embolism
• Deep vein
thrombosis
• Sepsis
• Pneumonia
• Miocardial
infarction
Results: “... higher sensitivity and lower
specificity compared with patient safety
indicators based on discharge coding.”
“The promise of natural language
processing ... may be closer than ever.”
Other promising NLP accomplishments ...
• Smoking status (Savova, Hazlehurst)
• Peripheral arterial disease (Pathak)
• Medication extraction (Uzuner)
• Pneumonia (Chapman)
• Colonoscopy quality metrics (Harkema)
• Breast cancer recurrence (Carrell)
• Colorectal cancer screening behavior (Denny)
• Rheumatoid arthritis (Zeng)
Overview
• The promise of natural language processing (NLP)
• Challenges of developing NLP in the clinical domain
• Challenges in applying NLP in the clinical domain
• Improving access to text through NLP resources
NLP Success
Fresh off its butt-kicking performance on Jeopardy!, IBM’s
supercomputer "Watson" has enrolled in medical school at
Columbia University,” New York Daily News February 18th
2011
Clinical NLP Since 1960’s
Why has clinical NLP had little impact on
clinical care?
Barriers to Development
• Sharing clinical data difficult
– Have not had shared datasets for development and
evaluation
– Modules trained on general English not sufficient
• Insufficient common conventions and standards for
annotations
– Data sets are unique to a lab
– Not easily interchangeable
• Limited collaboration
– Clinical NLP applications silos and black boxes
– Have not had open source applications
• Reproducibility is formidable
– Open source release not always sufficient
– Software engineering quality not always great
– Mechanisms for reproducing results are sparse
Overview
• The promise of natural language processing (NLP)
• Challenges of developing NLP in the clinical domain
• Challenges in applying NLP in the clinical domain
• Improving access to text through NLP resources
Security & Privacy Concerns
• Clinical texts have many patient identifiers
– 18 HIPAA identifiers
• Names
• Addresses
• Items not regulated by HIPAA
Institutions
are
reluctant
to
share
data
– tight end for the Steelers
• Unique cases
– 50s-year-old woman who is pregnant
• Sensitive information
– HIV status
Lack of user-centered development and scalability
– Perceived cost of applying NLP outweighs the
perceived benefit (Len D’Avolio)
Overview
• The promise of natural language processing (NLP)
• Challenges of developing NLP in the clinical domain
• Challenges in applying NLP in the clinical domain
• Improving access to text through NLP resources
Access to Resources for Developing NLP Algorithms
NLP Experts
Clinicians &
Researchers
Informaticists
Patients
Resources for NLP Developers
Knowledge
Bases
Clinical Data
Annotations
Annotation
Environment
Domain
Schema
Ontology
Linguistic
representation of
clinical elements
“Patient denies a family
history of colon cancer”
Evaluation
Melissa Tharp
Modifier
Ontology
Modifiers of clinical
elements
Disease: colon cancer
Experiencer: family
Negation: no
Historical: yes
Schema Ontology: Elements
Schema Ontology: Relationships
Modifier Ontology
Modifiers are important for interpreting text
– Chest radiograph confirms pneumonia
– Family history of pneumonia
– No evidence of pneumonia
Affirmation/negation
Uncertainty
Allowable modifiers
Experiencer
For each clinical element
Historical/Recent
Severity
Modifier Ontology
Types of
Linguistic
modifiers
expressions
Actions
Translations
Schema Ontology Imports Modifier
Ontology
Medications
–
–
–
–
Type
Dose
Frequency
Route
Diagnosis
–
–
–
–
–
Negation
Uncertainty
Severity
History
Experiencer
Consistent with other models:
Clinical element models, cTAKES type system,
Common model
Domain Ontology for NLP
• Instance of schema ontology
• Clinical elements from a particular domain
Synonyms
Misspellings
Regular expression
Resources for NLP Experts
Schemas
Lack of shareable data is a barrier
• University of Pittsburgh Repository
Clinical Data
Annotations
Annotation
Environment
Evaluation
– 111,045 reports of 9 types
– 600 users
– No longer available
• MT Samples
– 2,300 reports from MTSamples.com
– De-identified
Resources for NLP Experts
Schemas
AMIA NLP Working Group
ShARe - Sharing Annotated Resources
5R01GM090187: Chapman, Savova, Elhadad
Clinical Data
Annotations
Annotation
Environment
Evaluation
• 600 clinical notes from MIMIC II repository
• Annotate disorders and modifiers
– Anatomic location
• Map to SNOMED codes
• CLEF Shared Task 2013 and 2014
– https://sites.google.com/site/shareclefehealth/
B South, D Mowery, S Velupillai, L Christensen, S Meystre
Resources for NLP Experts
Schemas
Distributed annotation in secure environment
Annotator
Registry
Clinical Data
Annotation Admin
eHOST
Annotations
Annotation
Environment
Evaluation
Web application
iDASH cloud
Client app
VA, SHARP, and NIGMS : S Duvall, B South, B Adams, G
Savova, N Elhadad, H Hochheiser
Annotator Registry
Annotators
• Enlist for annotation
• Certify for annotation tasks
– Personal health information
– Part-of-speech tagging
– UMLS mapping
• Set pay rate
NLP Admins
• Search for annotators
http://nlp-ecosystem.ucsd.edu/annotators
1. Assign annotators to a task
2. Create a Schema
3. Assign users and set time expectations
4. Keep track of progress
Resources for NLP Experts
Schemas
Distributed annotation in secure environment
Annotator
Registry
Clinical Data
Annotation Admin
eHOST
Annotations
Annotation
Environment
Evaluation
Web application
iDASH cloud
Client app
Resources for NLP Experts
Schemas
• Compare output of NLP annotators
• NLP system vs human annotation
Clinical Data
• View annotations
• Calculate outcome measures
Annotations
Annotation
Environment
Evaluation
• Drill down to all levels of annotation
• Perform error analysis
Document &
annotations
Outcome Measures for
Selected Annotations
Report
List
Attributes for
Selected
Annotation
Select
Classifications
to View
Relationships for
Selected
Annotation
VA and ONC SHARP: Christensen, Murphy, Frabetti, Rodriguez, Savova
Access to Information in Text
NLP Experts
Clinicians &
Researchers
Informaticists
Patients
Controlled Vocabs
Dry cough
Productive cough
Cough
Hacking cough
Bloody cough
User’s Concepts
Cough
Dyspnea
Infiltrate on CXR
Wheezing
Fever
Cervical
Lymphadenopathy
Attribute-values
Temp 38.0C
Low-grade
temperature
User’s Concepts
Cough
Dyspnea
Infiltrate on CXR
Wheezing
Fever
Cervical
Lymphadenopathy
Efficient Access to Information in
the Patient Chart
“Family history of colon cancer”
Knowledge
Author
Schema Builder
Chart Review
Interface
"x-ray pneumothorax"@en
respiratorySyndrome
"air in the pleural space
on x-ray"@en
Disease: colon cancer
Experiencer: family
Negation: no
Historical: yes
broader
preferred label
alternative label
xRayPneumothorax
data category
"symptom"
data category
modified
"chest_radiography"
isAssociatedWithDisease
definition
pneumothoraxDX
NLP Schema
"xray pneumothorax"@en
alternative label
2011-03-31
"Air between the lung and the chest wall
seen on chest roentgenogram"
Domain Ontology
Knowledge Author
• Front end interface for users
• Back end
– Schema ontology
– Modifier ontology
• Output
– Domain ontology
– Schema for NLP system
B Scuba, F Fana, Liqin Wang, Mingyuan Zhang, Y Liu, M Kong, F Drews
African American Adult
Questions | Discussion
[email protected]
Ibuprofen
Ibuprofen p.o.
No family history of colon cancer
Linguistic modifiers
Calls Voogo synonym tool
Access Information in Patient Chart
Knowledge
Author
Chart Review
Interfaces
• Navigate patient data more efficiently
• Point chart reviewer to ambiguous and
contradictory information
– Reduce bias
Access Information in Patient Chart
Knowledge
Author
Chart Review
Interfaces
EMR
NLP
Subjects, Diagnoses
Findings,
Anatomical Locations
Viz
Feedback –
improve
models
Population
Patient
Document
Expression
User Identifies
Patients Meeting
Criteria
Interactive Search and Review of Clinical Records with Multi-layered Semantic Annotation
NLM 1R01LM010964-01. Chapman, Wiebe, Hwa.
Population View
Patient View
Access to NLP Tools and Interfaces
NLP Experts
Clinicians &
Researchers
Informaticists
Patients
Access to NLP Tools
v3NLP (Zeng, Divita)
pyConText (Chapman)
RapTat (Matheny, Gobbell)
NLP
Workbench
Classifier
Workbench
NLP
Platform
Annotations
Mix &
Match
Visualization
Workbench
KB
User
• Interact
• Customize
TextVect
Select NLP Features
NLP
Workbench
User
Select
Representation
X N-grams
Binary
X UMLS Concepts
X Count
Part-of-speech tags
Classifier
Workbench
tf-idf
X Negation
TextVect
Visualization
Workbench
Yes
No
No
NLP Tools
Feature
Selection
Algorithms
Training Set
A Kumar, C Elkan, S Abdelrahman
https://github.com/abhishek-kumar/TextVect
Yes
1 0 0
0 1 1 1
No
0 0 1
1 0 0 0
No
0 0 0
1 0 1 0
Evaluation of TextVect
CMC
dataset
Micro-FMeasure
I2b2
dataset
Micro-FMeasure
Average 0.77
Baseline 0.71
Best
Average 0.91
0.89
TextVect 0.82
Best
0.97
TextVect 0.95
Access to Visualizations
of NLP Output
NLP
Workbench
Classifier
Workbench
Visualization
Workbench
NLP
System
Annotations
Visualizatio
n
workbench
Timeline View
Jianlin Shi, T Wang, E Shenvi, R El-Kareh, M Tharp, R Reeves
Access to Understanding
NLP Experts
Clinicians &
Researchers
Informaticists
Patients
Access to Understanding
Clinical Notes
Chief Complaint:
Hypoxic respiratory failure
Major Surgical or Invasive Procedure:
Intubation.
History of Present Illness:
81 yo man w/ho CAD, COP, PVD, AAA
xfered from OSH for mngmt resp failure. Pt
was found @ home by EMS followign c/o
[**05-29**] "crushing", nonradiating SSCP.
Pt diaphoretic during transport. Sat 84->94% on NRB. Given ASA, NT, nebs en
route to OSH where started on BIPAP and
eventually intubated. BP on arrival 240/140
so started on NTG drip titrated up until BP
fell to 90/58 resulting in IVF, dopamine.
Given 80 IV lasix. First set enzymes
negative and BNP 1700. Pt xferred for
further management.
• Definitions
• Medical terms
• Acronyms/abbreviations
• Pictures
• Internet sites
• Biomedical literature
• Normal range checking
Conclusion
• Collaborations for NLP improve ability to
– Create potentially useful resources and tools
• Provide access to
– Resources for NLP development
– Information in reports
– NLP and visualization tools
• Major challenge is applying NLP
• Future need
– More integration with other tools
– More coordination
Acknowledgments
BLU Lab
•
•
•
•
•
•
•
•
•
•
Collaborators
Lee Christensen
•
Melissa Tharp
•
Mike Conway
•
Danielle Mowery •
Bill Scuba
•
Milan Kovacevich •
Dieter Hillert
•
Samir Abdelrahman •
Leah Willis
•
Bob Angell
•
Harry Hochheiser
Jan Wiebe
Rebecca Hwa
Guergana Savova
Noemie Elhadad
Michael Matheny
Rob El-Kareh
Ruth Reeves
Qing Zeng
Guy Divita
• Frank Drews
•
•
•
•
•
•
•
•
•
•
•
Sumithra Vellupilai
Maria Kvist
Maria Skeppstedt
Aron Henrikkson
Brian Chapman
David Carrell
Sascha Dublin
Zia Agha
Stephane Meystre
Scott DuVall
Jianlin Shi
Questions | Discussion
[email protected]