frame semantics

Download Report

Transcript frame semantics

Medication Extraction from Clinical
Data Using Frame Semantics
DIMITRIOS KOKKINAKIS
Centre for Language Technology
University of Gotehnburg
[email protected]
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
OVERVIEW
 Motivation
 Semantic Annotation of Corpora and
Event-Based Information Extraction
 e.g. i2b2 Medication Challenge
 Frame Semantics
 Medical Frames
 Pilot. Administration_of_Medication
 Design and Resources (so far…)
 Conclusion and Future Work
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
MOTIVATION (EXTRACTION OF FACTS and EVENTS)
Semantic annotation of corpora for mining complex
relations and events has gained a considerable growing
attention in the medical domain
Goal (work in progress) to develop an appropriate
infrastructure for automatic event labeling in the clinical
domain using hybrid techniques (e.g. supervised
machine learning, rules, lexicons, etc)
Event extraction can be modeled as a sequential
tagging problem, train and test data sets will be/are
taken from Swedish medical corpora while the Swedish
FrametNet++ provides the basis for the events’
description
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
EVENT-BASED INFORMATION EXTRACTION
Information extraction (IE) is a technology that
has a direct correlation with frame-like structures in
FrameNet; since templates in the context of IE are
frame-like structures with slots representing event
information. Most event-based IE approaches are
designed to identify role fillers that appear as
arguments to event verbs or nouns, either explicitly via
syntactic relations or implicitly via proximity
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
The ”Medication Challenge” i2b2… (2009)
The Third i2b2 Workshop on NLP Challenges for Clinical Records
(designed as an information extraction task) focused on the extraction
of medications and medication-related information from discharge
summaries
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
The ”Medication Challenge” i2b2… (2009)
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
ADVANTAGES OF STRUCTURED DATA…
 get an overview of the medication ordered in diff dimensions
 help organize and improve the presentation of EHR; advanced
graphical presentation of EHR data
 create the basis for data mining, evidence-based medicine; e.g.
for the epidemiological analysis of adverse events
 allow the automatic transmission of data to various registries
 aggregate data from many patients in repositories, facilitating e.g.
open comparisons
 make the selection of more reliable quality comparisons between
different parts of the country / world
 create a database directly accessible to the research
 allowing the generation of new hypotheses and new (semantic)
relationships
 improving patient safety, pharmacovigilance …
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
FRAME SEMANTICS…
The FrameNet approach is based on the linguistic theory of frame
semantics supported by corpus evidence. A semantic frame is a
script-like structure of concepts, which are linked to the meanings
of linguistic units and associated with a specific event or state
Each frame identifies a set of frame elements, which are frame
specific semantic roles; both so called core roles, arguments, tightly
coupled with the particular meaning of the frame and more generic
non-core ones, adjuncts or modifiers which to large extent are
event-independent semantic roles
When using computers to extract semantic information for NLP
tasks, FrameNet's semantic mapping provides a means for the
computer to extract meaning from a string of words
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
FRAME SEMANTICS…
Thus, a word activates, or evokes, a frame of semantic knowledge
relating to the specific concept it refers to. A semantic frame is a
collection of facts that specify "characteristic features, attributes,
and functions of a denotatum, and its characteristic interactions
with things necessarily or typically associated with it". A semantic
frame can also be defined as a coherent structure of related
concepts that are related such that without knowledge of all of
them, one does not have complete knowledge of any one
E.g., one would not be able to understand the word sell without
knowing anything about the situation of commercial transfer, which
also involves a seller, a buyer, goods, money, the relation between
the money and the goods and so on
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
RELEVANT APPLICATIONS…
FN began collaborations with two industrial partners this year. One is
with a defense contractor to develop frames and annotation for
reports written by U.S. soldiers after patrols in Afghanistan and Iraq.
The other is a partnership with Siemens Research U.S. to develop
frames and annotation for medical texts, such as medical
textbooks and guidelines for the treatment of diseases.
http://www.icsi.berkeley.edu/pubs/icsi/2011AnnualReport.pdf
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
A slide from an LREC 2012 presentation (closing session)
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
MEDICALLY ORIENTED FRAMES
www.svenska.gu.se www.clt.gu.se
https://framenet.icsi.berkeley.edu/fndrupal/index.php?q=frame_report&name=Medical_intervention
spraakbanken.gu.se
Swedish MEDICALLY ORIENTED FRAMES
Administration_of_medication
Addiction
Birth
Death
Experience_bodily_harm
Falling_ill
Health_response
Institutionalization
Medical_disorders
Medical_instruments
Medical_interaction_scenario
Medical_professionals
Medical_specialties
Medical_treatment
Observable_bodyparts
People_by_disease
Recovery
…
http://spraakbanken.gu.se/eng/research/swefn/development-version
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
Example Frame: CURE
http://spraakbanken.gu.se/eng/research/swefn/development-version
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
Example
http://spraakbanken.gu.se/eng/research/swefn/development-version
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
Frame: Administration_of_Medication
CORE
Frame Elements
NON-CORE
Frame Elements
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
Design so far… Resources in Use
1. FASS is the Swedish national formulary: contains a list of
medicines that are approved for prescription throughout
2. Swedish SNOMED CT’s Substance hierarchy: contains
“concepts that can be used for recording active chemical
constituents of drug projects, food and chemical allergens,
adverse reactions, toxicity or poisoning information, and
physicians and nursing orders”
<http://www.ihtsdo.org/snomed-ct/snomed-ct0/snomed-ct-hierarchies/substance/>
3. Swedish MeSH’s category D, Chemicals and Drugs (5,886)
4. Drug lexicon extensions (e.g. generic expressions of drugs,
detecting misspellings)
5. List of relevant abbreviations+variants: iv, i.v., im, i.m. sc,
s.c., po, p.o., vb, v.b., V b, T, inj., tbl, …
6. …
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
Design so far… Resources in Use
1. Named Entity Recognition for the relevant entities:
1. Drug Names
2. Time
3. Frequency
2. Terminology Recognition
1. MeSH
2. SNOMED CT
3. (ongoing) Manual annotation with the frame elements
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
Richard Johansson, Karin Friberg Heppin, Dimitrios Kokkinakis. Semantic Role Labeling
with the Swedish FrameNet. Proceedings of the 8th International Conf on Language
Resources and Evaluation (LREC'12), pp. 3697–3700. Istanbul, Turkey, 2012.
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
CONCLUSIONS
The driving force for the experiments is frame semantics, which
allows us to work with a more holistic and detailed semantic event
description than it is possible using for instance most traditional
efforts based on binary relation extraction approaches
Event extraction is more complicated and challenging than relation
extraction since events usually have internal structure involving
several entities as participants allowing a detailed representation of
more complex statements
Preliminary results suggest that SweFN++ seems a good start for
annotating corpora. The role set described is general enough to
capture a wide range of phenomena that characterize the majority
of semantic arguments of general medical events
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
FUTURE WORK
Need larger size of annotated corpora for larger scale
experiments (which are planned…)
We are currently working with:
• extending/refining/encoding new frames according to the BFN
descriptions
• manually annotating larger corpora
• investigate how existing frame descriptions can actually capture
semantics
• continue with more experiments (methods, software, larger data
sets) for learning to annotate the arguments
• using a richer set of features, and particularly syntactic
information and the distance between the arguments
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se
…related REFERENCES
• Sigfried Gold, Noémie Elhadad, Xinxin Zhu, James J. Cimino, and George Hripcsak.
Extracting Structured Medication Event Information from Discharge Summaries. AMIA
Annu Symp Proc. 2008; 2008: 237–241.
• Jon Patrick, Min Li. High accuracy information extraction of medication information
from clinical notes: 2009 i2b2 medication extraction challenge. J Am Med Inf Assoc
2010;17:524e527.
• Louise Deléger, Cyril Grouin, Pierre Zweigenbaum. Extracting medical information from
narrative patient records: the case of medication-related information. J Am Med Inf
Assoc 2010;17:555e558.
• Son Doan, Lisa Bastarache, Sergio Klimkowski, Joshua C Denny, Hua Xu. Integrating
existing natural language processing tools for medication extraction from discharge
summaries. J Am Med Inf Assoc 2010;17:528e531.
• Thierry Hamon, Natalia Grabar. Linguistic approach for identification of medication
names and related information in clinical narratives. J Am Med Inf Assoc
2010;17:549e554.
• Scott Russell Halgrim, Fei Xia, Imre Solti, Eithon Cadag, Özlem Uzuner. A cascade of
classifiers for extracting medication information from discharge summaries. J of
Biomed Sem 2011, 2(Suppl 3):S2
www.svenska.gu.se www.clt.gu.se
spraakbanken.gu.se