Transcript スライド 1
Extraction of Adverse Drug
Effects from Clinical Records
Our material is Discharge
Summary
E. ARAMAKI* Ph.D.,
Y. MIURA **,
M. TONOIKE ** Ph.D.,
T. OHKUMA ** Ph.D.,
H. MASHUICHI ** Ph.D.,
K.WAKI * Ph.D. M.D.,
K.OHE * Ph.D. M.D.,
* University of Tokyo, Japan
** Fuji Xerox, Japan
Background
• The use of Electronic Health Records (EHR) in
hospitals is increasing rapidly everywhere
• They contain much clinical information about
a patient’s health
BUT
Many Natural
Language texts !
Extracting clinical information
from the reports is difficult
because they are written in
natural language
NLP based Adverse Effect Detecting System
• We are developing a NLP system that extracts
medical information, especially Adverse Effect,
form natural language parts
≒ i2b2
Medication
• INPUT
Challenge
– a medical text (discharge summary)
• OUTPUT
– Date Time
– Medication Event
– Adverse Effect Event
But our target focuses
only on adverse effect
Adverse Effect Relation (AER)
Why Adverse Effect Relations?
• Clinical trials usually target only a single drug.
• BUT: real patients sometimes take multiple
medications, leading to a gap separating the
clinical trials and the actual use of drugs
• For ensuring patient safety, it is extremely
important to capturing a new/unknown AEs in
the early stage.
DEMO is available on
http://mednlp.jp
System
Demo
副作用関係の推定
System
Demo
副作用関係の推定
Medication
Cc
Relation
has no complications at the time of diagnosis
6/23-25 FOLFOX6 2nd.
6/24, 25: moderate fever (38℃) again. a fever reducer….
Adverse Effect
The point of This Study
• (1) Preliminary Investigation: How much information
actually exist?
– We annotated adverse effect information in
discharge summaries
• (2) NLP Challenge: Could the current NLP retrieve
them?
– We investigated the accuracy of with which the
current technique could extract adverse effect
information
Outline
• Introduction
• Preliminary Investigation
– How much information actually exist in discharge
summary?
• NLP Challenge
• Conclusions
Material & Method
• Material: 3,012 Japanese Discharge Summaries
• 3 humans annotated possible adverse effects due to
the following 2 steps
Step 1 <D>Lasix<D> for <S>hypertension</S> is stopped
Event Annotation due to <S>his headache</S>.
XML tag = Event
Step 2
Relation
Annotation
<D rel=“1”>Lasix<D> for <S>hypertension</S> is
stopped due to <S rel=“1”>his headache</S>.
XML attribute = Relation
Annotation Policy & Process
• We regard only MedDRA/J terms as the events.
adverse effect terminology
• We regarded even a suspicion of an adverse effect as
positive data.
• Entire data annotation is time-consuming
→ We split data into 2 sets
SET-A (Event Rich parts): contains keywords such
as Stop, Change, Adverse effect, Side effect
Full annotated
SET-B: The other
Randomly sampled & annotated
SET-A
SET-B
14.5%×53.5% + 85.5%×11.3% = 17.4%
Results of Preliminary Investigation
• About 17% discharge summaries contain
adverse effect information.
– Even considering that the result includes just a
suspicion of effects, the summaries are a valuable
resource on AE information.
• We can say that discharge summaries are
suitable resources for our purpose.
Outline
• Introduction
• Preliminary Investigation
• NLP Challenge
– Could the current NLP technique retrieve the AEs?
• Conclusions
Combination of 2 NLP Steps
• 2 NLP steps directly correspond to each
annotation step
Lasix for hyperpiesia is stopped due to the pain in the head.
Event
Medication
Annotation
symptom
Relation
Annotation
symptom
Adverse Effect Relation
≒Named Entity Recognition Task
= Relation Extraction Task, which is one of the
most hot NLP research topics.
Step1: Event Identification
• Machine Learning Method
– CRF (Conditional Random Field) based Named
Entity Recognition
state-of-the-art method at
• Feature
i2b2 de-identification task
– Lexicon (Stemming), POS, Dictionary based
feature (MedDRA), window size=5
• Material
Standard Feature Set
– SET-A Corpus with Event Annotations
Step1: Result of Event
Identification
• Result Summary
Cat. of Event
Precision
Recall
F-measure
Medication Event
86.99
85.56
81.34
80.24
0.84
0.82
AE Event
• All accuracies (P, R) >> 80 %, F>0.80,
demonstrating the feasibility of our approach
• Considering that the corpus size is small (435
summaries), we can say that the event
detection is an easy task
Step2: Relation Extraction Method
• Basic Approach ≒Protein-Protein Interaction (PPI)
task [BioNLP2009-shared Task]
For each m (Medications)
For each a (Adverse Effects)
judge_it_has_rel (a, m)
• Example
Lasix for hypertension is stopped due to his headache
(1) judge_it_has_AER (Lasix , hypetension)
(2) judge_it_has_AER (Lasix , headach)
Two judgment methods
• (1) PTN-BASED: heuristic rules using a set-ofkeyword & word distance
..is on ACTOS but stopped for relief of the edema .
keyword
<medication>
n=1
<adverse effect>
n=4
Judge_it_has_AER (m, a, keyword=stopped, windowsize5)
• (2) SVM-BASED: Machine learning approach
– Feature: distance & words between two events (
medication & adverse effect)
See proceedings for detailed
Step2: Result of Relation Extraction
PTN-BASED
SVM-BASED
Precision
Recall
F-measure
41.1%
57.6%
91.7%
62.3%
0.650
0.598
• Both PTN & SVM accuracies are low (F<0.65)
→ the Relation extraction task is difficult!
• SVM accuracy is significant (p=0.05) lower
than PTN
(1) Corpus size is small
(2) positive data << negative data
Machine learning suffers from such small imbalanced data
Outline
•
•
•
•
Introduction
Preliminary Investigation
NLP Challenge
Discussions
– (1) Overall Accuracy
– (2) Controllable Performance
– (3) Event Distribution
• Conclusions
Discussion (1/3)
Overall Accuracy
• The overall accuracy is estimated by the
combined accuracies of step1 & step2
Overall (=
Precision
Recall
step1
× step2)
0.289 (=0.855 × 0.869 × 0.390)
0.597 (=0.802 × 0.813 × 0.917)
• Each NLP step is not perfect, so, the
combination of such imperfect results leads to
the low accuracy (especially many false
positives; low precision)
Discussion (2/3)
Performance is Controllable
• The performance
balance between
recall & precision
could be controlled
High precision setting
High recall setting
That is a strong
advantage of NLP
Precision & Recall curve in SVM
Discussion (3/3)
Event Distribution
• We investigated the entire AE frequency for
each medication category.
AE freq. distribution of Drug #1
distribution acquired from
annotated real data
distribution acquired from our
system results
Discussion (3/3)
AER Distribution
• Then, we checked the goodness of the fit test,
which measures the similarity between two
P-value
distributions
Med. 1
Med. 2
Med. 3
Med. 4
Med. 5
0.023
0.013
0.010
0.006
0.005
Total
0.011
• High p-value (p=0.011 > 0.01) indicates
two distributions are similar.
Outline
•
•
•
•
Introduction
Preliminary Investigation
NLP Challenge
Discussions
• Conclusions
Conclusions (1/2)
• Preliminary Investigation:
– About 17% discharge summaries contain adverse
effect information.
– We can say that discharge summary are suitable
resources for AERs
• NLP Challenge:
– Could NLP retrieve the AE information?
– Difficult! Overall accuracy is low
Conclusions (2/2)
• BUT: 2 positive findings:
(1) We can control the performance balance
(2) Even the accuracy is low, the aggregation of the
results is similar to the real distribution
• IN THE FUTURE:
–A practical system using the above advantages
–More acute method for relation extraction
Thank you
Contact Info
–
–
–
–
Eiji ARAMAKI Ph.D.
University of Tokyo
[email protected]
http://mednlp.jp