Transcript Slide 1

Processing Semantic Relations
Across Textual Genres
Bryan Rink
University of Texas at Dallas
December 13, 2013
Outline
•
•
•
•
•
Introduction
Supervised relation identification
Unsupervised relation discovery
Proposed work
Conclusions
Motivation
• We think about our world in terms of:
– Concepts (e.g., bank, afternoon, decision, nose)
– Relations (e.g. IS-A, PART-WHOLE, CAUSE-EFFECT)
• Powerful mental constructions for:
– Representing knowledge about the world
– Reasoning over that knowledge:
• From PART-WHOLE(brain, Human) and IS-A(Socrates,
Human)
• We can reason that PART-WHOLE(brain, Socrates)
Representation and Reasoning
• Large general knowledge bases exist:
– WordNet, Wikipedia/DBpedia/Yago, ConceptNet, OpenCyc
• Some domain specific knowledge bases exist:
– Biomedical (UMLS,
– Music (Musicbrainz)
– Books (RAMEAU)
• All of these are available in the standard RDF/OWL data
model
• Powerful reasoners exist for making inferences over
data stored in RDF/OWL
• Knowledge acquisition is still the most time consuming
and difficult among these
Relation Extraction from Text
• Relations between concepts are encoded
explicitly or implicitly in many textual
resources:
– Encyclopedias, news articles, emails, medical
records, academic articles, web pages
• For example:
– “The report found Firestone made mistakes in the
production of the tires.”
 PRODUCT-PRODUCER(tires, Firestone)
Outline
•
•
•
•
•
Introduction
Supervised relation identification
Unsupervised relation discovery
Proposed work
Conclusions
Supervised Relation Identification
• SemEval-2010 Task 8 – “Multi-Way Classification of
Semantic Relations Between Pairs of Nominals”
– Given a sentence and two marked nominals
– Determine the semantic relation and directionality of that
relation between the nominals.
• Example: A small piece of rock landed into the trunk
• This contains an ENTITY-DESTINATION(piece, trunk)
relation:
– The situation described in the sentence entails the fact
that trunk is the destination of piece in the sense of piece
moving (in a physical or abstract sense) toward trunk.
Semantic Relations
Relation
Definition
CAUSE-EFFECT
X causes Y
INSTRUMENT-AGENCY
Y uses X; X is the instrument of Y
PRODUCT-PRODUCER
Y produces X; X is the product of Y
CONTENT-CONTAINER
X is or was stored or carried inside Y
ENTITY-ORIGIN
Y is origin of an entity X, X coming/derived from Y
ENTITY-DESTINATION
X moves toward Y
COMPONENT-WHOLE
X is component of Y and has a functional relation
MEMBER-COLLECTION
X is a member of Y
MESSAGE-TOPIC
X is a message containing information about Y
OTHER
if none of the nine relations appears to be suitable
Observations
• Three types of evidence useful for classifying relations:
1. Lexical/Contextual cues
–
“The seniors poured flour into wax paper and threw the
items as projectiles on freshmen during a morning pep
rally”
2. Knowledge of the typical role of one nominal
–
“The rootball was in a crate the size of a refrigerator, and
some of the arms were over 12 feet tall.
3. Knowledge of a pre-existing relation between the
nominals
–
“The Ca content in the corn flour has also a strong
dependence on the pericarp thickness.”
Approach
• Use an SVM classifier to first determine the relation type
– Each relation type then has its own SVM classifier to determine
direction of the relation
• All SVMs share same set of 45 feature types which fall into the
following 8 categories:
–
–
–
–
–
–
–
–
Lexical/Contextual
Hypernyms from WordNet
Dependency parse
PropBank parse
FrameNet parse
Nominalization
Nominal similarity derived from Google N-Grams
TextRunner predicates
System
Lexical/Contextual Features
• Words between the nominals are very important:
cause  Cause-Effect
used InstrumentAgency
makes ProductProducer
contained ContentContainer
from Entity-Origin
intoEntity-Destination
onComponent-Whole
ofMember-Collection
aboutMessage-Topic
• Number of tokens between the nominals is also helpful:
– Product-Producer, Entity-Origin often have zero: “organ builder”,
“Coconut oil”
• Additional features for:
– E1/E2 words, E1/E2 part of speech, Words before/after the nominals,
Prefixes of words between
– Sequence of word classes between the nominals:
• Verb_Determiner, Preposition_Determiner, Preposition_Adjective_Adjective,
etc.
Example Feature Values
• Sentence: Forward [motion]E1 of the vehicle through
the air caused a [suction]E2 on the road draft tube.
• Feature values:
–
–
–
–
–
–
–
–
–
e1Word=motion, e2Word=suction
e1OrE2Word={motion, suction}
between={of, the, vehicle, through, the, air, caused, a}
posE1=NN, posE2=NN
posE1orE2=NN
posBetween=I_D_N_I_D_N_V_D
distance=8
wordsOutside={Forward, on}
prefix5Between={air, cause, a, of, the, vehic, throu, the}
Parsing Features
• Dependency Parse (Stanford parser)
– Paths of length 1 from each nominal
– Paths of length 2 between E1 and E2
• PropBank SRL Parse (ASSERT)
– Predicate associated with both nominals
• Number of tokens in the predicate
• Hypernyms of predicate
– Argument types of nominals
• FrameNet SRL Parse (LTH)
– Lemmas of frame trigger words, with and without part of
speech
• Also make use of VerbNet to generalize verbs from
dependency and PropBank parses
Example Feature Values
• Sentence: Forward [motion]E1 of the vehicle
through the air caused a [suction]E2 on the road
draft tube.
• Dependency
– <E1>nsubjcauseddobj<E2>
– <E1>nsubjvn:27dobj<E2>
• VerbNet/Levin class 27 is the class of engender verbs such
as: cause, spawn, generate, etc.
• This feature value indicates that E1 is the subject of an
engender verb, and the direct object is E2
• PropBank
– Hypernyms of the predicate: cause#v#1, create#v#1
Nominal Role Affiliation Features
• Sometimes context is not enough and we must use
background knowledge about the nominals
• Consider the nominal: writer
– Knowing that a writer is a person increases the likelihood
that the nominal will act as a Producer or an Agency
• Use WordNet hypernyms for the nominal’s sense determined by
SenseLearner
– Additionally, writer nominalizes the verb write, which is
classified by Levin as a “Creation and Transformation”
verb.
• Most likely to act as a Producer
• Use NomLex-Plus to determine the verb being nominalized and
retrieve the Levin class from VerbNet
Google N-Grams for Nominal Role
Affiliation
• Semantically-similar nominals should participate in the
same roles
– They should also occur in similar contexts in a large corpus
• Using Google 5-grams, the 1,000 most frequent words
appearing in the context of a nominal are collected
• Using Jaccard similarity on those context words, the 4
nearest neighbor nominals are determined, and used
as a feature
– Also, determine the role most frequently associated with
those neighbors
Example Values for Google N-Grams
Feature
• Sentence 4739: As part of his wicked plan, Pete
promotes Mickey and his pals into the [legion]E1
of [musketeers]E2 and assigns them to guard
Minnie.
– MEMBER-COLLECTION(E2 , E1)
• E1 nearest neighbors: legion, army, heroes,
soldiers, world
– Most frequent role: COLLECTION
• E2 nearest neighbors: musketeers , admirals,
sentries, swordsmen, larks
– Most frequent role: MEMBER
Pre-existing Relation Features
• Sometimes the context gives few clues about
the relation
– Can use knowledge about a context-independent
relation between the nominals
• TextRunner
– A queryable database of NOUN-VERB-NOUN triples
from a large corpus of web text
– Plug in E1 and E2 as the nouns and query for
predicates that occur between them
Example Feature Values for
TextRunner Features
• Sentence: Forward [motion]E1 of the vehicle
through the air caused a [suction]E2 on the
road draft tube.
• E1 ____ E2 : may result from, to contact,
created, moves, applies, causes, fall below,
corresponds to which
• E2 ____ E1 : including, are moved under, will
cause, according to, are effected by, repeats,
can match
Results
Relation
Precision
Recall
F1
Cause-Effect
89.63
89.63
89.63
Component-Whole
74.34
81.73
77.86
Content-Container
84.62
85.94
85.27
Entity-Destination
88.22
89.73
88.96
Entity-Origin
83.87
80.62
82.21
Instrument-Agency
71.83
65.38
68.46
Member-Collection
84.30
87.55
85.89
Message-Topic
81.02
85.06
82.99
Product-Producer
82.38
74.89
78.46
Other
52.97
51.10
52.02
Overall
82.25
82.28
82.19
Learning Curve
F1
85
80
79.93
77.02
75
70
82.19
73.08
65
60
55
50
1000
2000
Training Size
4000
8000
Ablation Tests
• All 255 (= 28 – 1) combinations of the 8 feature sets were
evaluated by 10-fold cross validation
# of feature sets
Optimal feature sets
F1
1
Lexical
73.8
2
+Hypernym
77.8
3
+FrameNet
78.9
4
+Ngrams
79.7
5
-FrameNet +PropBank +TextRunner
80.5
6
+FrameNet
81.1
7
+Dependency
81.3
8
+NomLex-Plus
81.3
Lexical is the single best feature set, Lexical+Hypernym is the best 2-feature set
combination, etc.
Other Supervised Tasks
Causal relations between events – FLAIRS 2010
Causal Relations Between Events
• Discovered graph patterns that were then
used as features in a supervised classifier
• Example pattern:
– “Under the agreement”, “In the affidavits”, etc.
Detecting Indications of Appendicitis in
Radiology Reports
• Submitted to AMIA TBI 2013
Resolving Coreference in Medical
Records
• i2b2 2011 and JAMIA 2012
• Approach
– Based on Stanford Multi-Pass Sieve method
– Added supervised learning by introducing features
to each pass
– Showed that creating a first pass which identifies
all the mentions of the patient provides a
competitive baseline
Extracting Relations Between Concepts
in Medical Records
• i2b2 2010 Shared Task and JAMIA 2011
Supervised Relations Conclusion
• Identifying semantic relations requires going
beyond contextual and lexical features
• Use the fact that arguments sometimes have a
high affinity for one of the semantic roles
• Knowledge of pre-existing relations can aid
classification when context is not enough
Outline
•
•
•
•
•
Introduction
Supervised relation identification
Unsupervised relation discovery
Proposed work
Conclusions
Relations in Electronic Medical Records
• Medical records contain natural language
narrative with very valuable information
– Often in the form of a relation between medical
treatments, tests, and problems
• Example:
– … with the [transfusion] and [IV Lasix] she did not
go into [flash pulmonary edema]
– TREATMENT-IMPROVES-PROBLEM relations:
• (transfusion, flash pulmonary edema)
• (IV Lasix, flash pulmonary edema)
Relations in Electronic Medical Records
• Additional examples:
– [Anemia] secondary to [blood loss].
• A causal relationship between problems
– On [exam] , the patient looks well and lying down
flat in her bed with no [acute distress] .
• Relationship between a medical test (“exam”) and what
it revealed (“acute distress”).
• We consider both positive and negative findings.
Relations in Electronic Medical Records
• Utility
– Detected relations can aid information retrieval
– Automated systems which review patient records
for unusual circumstances
• Drugs prescribed despite previous allergy
• Tests and treatments never performed despite
recommendation
Relations in Electronic Medical Records
• Unsupervised detection of relations
– No need for large annotation efforts
– Easily adaptable to new hospitals, doctors,
medical domains
– Does not require a pre-defined set of relation
types
• Discover relations actually present in the data, not what
the annotator thinks is present
– Relations can be informed by very large corpora
Unsupervised Relation Discovery
• Assumptions:
– Relations exist between entities in text
– Those relations are often triggered by contextual
words: trigger words
• Secondary to, improved, revealed, caused
– Entities in relations belong to a small set of semantic
classes
• Anemia, heart failure, edema: problems
• Exam, CT scan, blood pressure: tests
– Entities near each other in text are more likely to have
a relation
Unsupervised Relation Discovery
• Latent Dirichlet Allocation baseline
– Assume entities have already been identified
– Form pseudo-documents for every consecutive pair of
entities:
• Words from first entity
• Words between the entities
• Words from second entity
• Example:
– If she has evidence of [neuropathy] then we would
consider a [nerve biopsy]
– Pseudo-document: {neuropathy, then, we, would,
consider, a, nerve, biopsy}
Unsupervised Relation Discovery
• These pseudo-documents lead LDA to form
clusters such as:
“causal”
“stopwords”
“reveal problem”
“prescription”
to
and
was
(
due
,
on
mg
secondary
is
and
)
was
she
,
needed
be
had
which
as
,
has
showed
PO
likely
this
he
PRN
have
are
done
:
found
that
showing
for
thought
after
demonstrated
every
Unsupervised Relation Discovery
• Clusters formed by LDA
– Some good trigger words
– Many stop words as well
– No differentiation between:
• Words in first argument
• Words between the arguments
• Words in second argument
• Can do a better job
– By better modeling the linguistic phenomenon
Relation Discovery Model (RDM)
• Three observable variables:
– w1 : Token from the first argument
– wc : Context word (between the arguments)
– w2 : Tokens from the second argument
• Example:
– Recent [chest x-ray] shows [resolving right lower lobe
pneumonia] .
– w1: {chest, x-ray}
– wc: {shows}
– w2: {resolving, right, lower, lobe, pneumonia}
Relation Discovery Model (RDM)
• In RDM:
– A relation type (tr) is generated
– Context words (wc) are generated from:
• Relation type-specific word distribution (showed, secondary, etc.);
or
• General word distribution (she, patient, hospital)
– Relation type-specific semantic classes for the arguments
are generated
• e.g. a problem-causes-problem relation would be unlikely to
generate a test or a treatment class
– Argument words (w1, w2) are generated from argument
class-specific word distributions
• “pneumonia”, “anemia”, “neuropathy” from a problem class
Relation Discovery Model (RDM)
• Graphical model:
Experimental Setup
• Dataset
– 349 medical records from 4 hospitals
– Annotated with:
• Entities: problems, treatments, tests
• Relations: Used to evaluate our unsupervised approach
–
–
–
–
–
–
–
–
Treatment-Addresses-Problem
Treatment-Causes-Problem
Treatment-Improves-Problem
Treatment-Worsens-Problem
Treatment-Not-Administered-Due-To-Problem
Test-Reveals-Problem
Test-Conducted-For-Problem
Problem-Indicates-Problem
Results
• Trigger word clusters formed by the RDM:
“connected
problems”
“test showed”
“prescription”
“prescription 2”
due
showed
mg
(
consistent
no
p.r.n.
)
not
revealed
p.o.
Working
likely
evidence
hours
ICD9
secondary
done
pm
Problem
patient
2007
q
Diagnosis
(
performed
needed
30
started
demonstrated
day
cont
most
without
q.
):
s/p
normal
4
closed
Results
• Instances of “connected problems”
First Argument
Context
Second Argument
ESRD
secondary to her
DM
slightly lightheaded
and with
increased HR
Echogenic kidneys
consistent with
renal parenchymal disease
A 40% RCA
, which was
Hazy
Librium
for
Alcohol withdrawal
Last example is actually a Treatment-Administered-For-Problem
Results
• Instances of “Test showed”
First Argument
Context
Second Argument
V-P lung scan
Was performed on May 24 2007,
showed
low probability of PE
A bedside
transthoracic
echocardiogram
done in the Cardiac Catheterization
laboratory without evidence of
an effusion
Exploration of the
abdomen
revealed
significant
nodularity of the
liver
echocardiogram
showed
moderate dilated
left atrium
An MRI of the right was done which was equivocal for
leg
osteomyelitis
Results
• Instances of “prescription”
First
Argument
Context
Haldol
0.5-1 milligrams p.o. q.6-8h. P.r.n.
Plavix
every day to prevent
KBL
mouthwash
, 15 ccp .0. q.d. prn
Miconazole
nitrate
powder
tid prn for
AmBisome
300 mg IV q.d. for treatment of
Second Argument
agitation
failure of these stents
mouth discomfort
groin rash
her hepatic candidiasis
Results
• Instances of “prescription 2”
First Argument
Context
Second
Argument
MAGNESIUM
HYDROXIDE SUSP
30 ML ) , 30 mL , Susp , By Mouth , At
Bedtime , PRN, For
constipation
Depression, major
( ICD9 296.00 , Working, Problem ) cont
NOS home
meds
Diabetes mellitus
type II
( ICD9 250.00 , Working , Problem ) cont
home meds
ASCITES
( ICD9 789.6 , Working , Diagnosis ) on
spironalactone
Dilutional
hyponatremia
( SNMCT **ID-NUM , Working , Diagnosis
) improved with
fluid restriction
Results
• Discovered Argument Classes
“problems”
“treatments/tests”
“tests”
pain
Percocet
CT
disease
Hgb
scan
right
Hct
chest
left
Anion
x-ray
renal
Gap
examination
patient
Vicodin
Chest
artery
RDW
EKG
-
Bili
MRI
symptoms
RBC
culture
mild
Ca
head
Evaluation
• Two versions of the data:
– DS1: Consecutive pairs of entities which have a
manually identified relation between them
– DS2: All consecutive pairs of entities
• Train/Test sets:
– Train: 349 records, with 5,264 manually annotated
relations
– Test: 477 records, with 9,069 manually annotated
relations
Evaluation
• Evaluation metrics
– NMI: Normalized Mutual Information
• An information-theoretic measure of how well two
clusterings match
– F measure:
• Computed based on the cluster precision and cluster
recall
• Each cluster is paired with the cluster which maximizes
the score
Evaluation
Method
DS1
DS2
NMI
F
NMI
F
Complete-link
4.2
37.8
N/A
N/A
K-means
8.25
38.0
5.4
38.1
LDA baseline
12.8
23.1
15.6
26.2
RDM
18.2
39.1
18.1
37.4
LDA baseline
10.0
26.1
11.5
26.3
RDM
11.8
37.7
14.0
36.4
Train Set
Test Set
Results with 9 relation types, 15 general word classes, and 15 argument classes for
RDM.
Unsupervised Relations Conclusion
• Trigger words and argument classes are jointly
modeled
• RDM uses only entities and tokens
• Relations are local to the context, rather than
global
• RDM outperforms several baselines
• Discovered relations match well with manually
chosen relations
• Presented at EMNLP 2011
Additional Relation Tasks
• Relational Similarity – SemEval 2012 Task 2
– Define a relation through prototypes:
• water:drop
time:moment
pie:slice
– Decide which is most similar:
• feet:inches
country:city
• Used a probabilistic approach to detect high
precision patterns for the relations
• Pattern precision was then used to rank word
pairs occurring with that pattern
Relational Selectional Preferences
• Submitted to IWCS 2013
• Use LDA to induce latent semantic classes
Outline
•
•
•
•
•
Introduction
Supervised relation identification
Unsupervised relation discovery
Proposed work
Conclusions
Proposed work
• Supervised vector representations
– Initially: word representations
• Most existing approaches create unsupervised
word representations
– Latent Semantic Analysis (Deerwester et al., 1990)
– Latent Dirichlet Allocation (Blei et al., 1998)
– Integrated Components Analysis (Scholkopf, 1998)
• More recent approaches allow for supervision
Existing supervised approaches
• HDLR
– “Structured Metric Learning for High Dimensional
Problems”
– Davis and Dhillon (KDD 2008)
• S2Net
– “Learning Discriminative Projections for Text Similarity
Measures”
– Yih, Toutanova, Platt, and Meet (CoNLL 2011)
– Learns lower-dimensional representations of documents
– Optimizes a cosine similarity metric in the lowerdimensional space for similar document retrieval
Supervised word representations
• Relational selectional preferences:
– Classify words according to their admissibility for
filling the role of a relation:
– report, article, thesis, poem are admissible for the
MESSAGE role of a MESSAGE-TOPIC relation
– Assume a (possibly very small) training set
Supervised word representations
– Each word is represented by a high-dimensional
context vector v over a large corpus
• e.g., documents the word occurs in, other words it cooccurs with, or grammatical links
– Learn a transformation matrix T which transforms
v into a much lower dimensional vector w
• subject to a loss function which is maximized when
words from the target set have high cosine similarity
• Learning can be performed using LBFGS optimization
on the loss function because the cosine similarity
function is twice differentiable
Proposed application
• Supervised word representations can be used for many
supervised tasks which use words as features
– Relation arguments
– Contextual words
• Not limited to words
– arbitrary n-grams
– syntactic features
• We believe this approach could be useful for any highdimensionality linguistic features (sparse features)
– Benefit comes from both a larger corpus and the
supervised learning of the representation
Additional evaluations
• ACE 2004/2005 relation data
– Relations between entities in newswire
• e.g., MEMBER-OF-GROUP – “an activist for Peace Now”
• BioInfer 2007
– Relations between biomedical concepts
• e.g., locations, causality, part-whole, regulation
• SemEval 2013 Task 4 and SemEval 2010 Task 9
– Paraphrases for noun compounds
– e.g., “flu virus”  “cause”, “spread”, “give”
Outline
•
•
•
•
•
Introduction
Supervised relation identification
Unsupervised relation discovery
Proposed work
Conclusions
Conclusions
• State of the art supervised relation extraction
methods in both general domain and medical
texts
• Identifying relations in text relies on more than
just context
– Semantic and background knowledge of arguments
– Background knowledge about relations themselves
• An unsupervised relation discovery model
Thank you!
Questions??