relationsplus

Download Report

Transcript relationsplus

CS4705
Relationships among Words,
Semantic Roles, and Word-Sense
Disambiguation
CS 4705
Today
• Lexical Relations
– Wordnet
• Semantic Role
– Review: Semantic Roles
– Selectional Restrictions
– Selectional Association
• Word-Sense Disambiguation
– Supervised
– Unsupervised
– Evaluation
Lexical Relations
• Semantic Networks: Used to represent lexical
relationships
– e.g. WordNet (George Miller et al)
– Most widely used hierarchically organized lexical database for
English
– Synset: set of synonyms, a dictionary-style definition (or gloss),
and some examples of uses --> a concept
– Databases for nouns, verbs, and modifiers
• Applications can traverse network to find synonyms,
antonyms, hyper- and hyponyms…
– Available for download or online use
– http://www.cogsci.princeton.edu/~wn
Homonymy
• Homonyms: Words with same form – orthography
and pronunciation -- but different, unrelated
meanings, or senses
– A bank1 holds investments in a custodial account in the
client’s name.
– As agriculture is burgeoning on the east bank2, the river
will shrink even more
http://www.etymonline.com/
• bank1 "financial institution," 1474, from either O.It. banca or M.Fr.
banque (itself from the O.It. term), both meaning "table" (the
notion is of the moneylender's exchange table), from a Gmc.
source (cf. O.H.G. bank "bench"); see bank (2). The verb meaning
"to put confidence in" (U.S. colloquial) is attested from 1884. Bank
holiday is from 1871, though the tradition is as old as the Bank of
England. Bankroll (v.) "to finance" is 1920s. To cry all the way to
the bank was coined 1956 by flamboyant pianist Liberace, after a
Madison Square Garden concert that was packed with patrons but
panned by critics.
• bank2 "earthen incline, edge of a river," c.1200, probably in O.E.,
from O.N. banki, from P.Gmc. *bangkon "slope," cognate with
P.Gmc. *bankiz "shelf."
Related Phenomena
• Homophones (same pron/different orth)
Read/red
• Homographs (same orth/different pron)
Bass/bass
Polysemy
• Words with multiple but related meanings
–
–
–
–
–
They rarely serve red meat.
He served as U.S. ambassador.
He might have served his time in prison.
idea bank, sperm bank, blood bank, bank bank
Can the two candidate senses be conjoined?
?He served his time and as ambassador to Norway.
– Same etymology
– Often a domain-dependent specialization
Synonymy
• Substitutability: different words, same meaning
– Old/aged, pretty/attractive, food/sustenance, money
How big is that plane? How large is that plane?
How big are you? How large are you?
• What makes words substitutable – and not?
– Polysemy (large vs. old sense)
– register: He’s really cheap/?parsimonious.
– collocational constraints:
roast beef, ?baked beef
economy fare ?economy price
How could we find Synonyms and Collocations
automatically?
• Synonyms: Identify words appearing frequently in
similar contexts
Blast victims were helped by civic-minded passersby.
Public-spirited passersby came to the aid of this bombing
victim.
• Collocations: Identify synonyms or closely related
words that do and don’t appear in similar contexts
Flu victims, flu sufferers vs. ?Cold victims, cold
sufferers…
Roast turkey vs. Baked turkey
Hyponomy
• General: hypernym (super…ordinate)
– dog is a hypernym of poodle
– Test: ‘That is a poodle’ implies ‘that is a dog’
• Specific: hyponym (under..neath)
– poodle is a hyponym of dog
– Test: ‘That is a poodle’ implies ‘that is a dog’
• Ontology: set of domain objects
• Taxonomy: Specification of relations between those
objects
• Object hierarchy: Structured hierarchy that supports
feature inheritance (e.g. poodle inherits some properties of
dog)
Tropes, or Figures of Speech
• Metaphor: one entity is given the attributes of another
(tenor/vehicle/ground)
– Life is a bowl of cherries. Don’t take it serious….
– We are the eyelids of defeated caves. ??
– GM killed the Fiero. (conventional metaphor: corp. as person)
• Metonymy: one entity used to stand for another (replacive)
– GM killed the Fiero.
– The ham sandwich wants his check. (deferred reference)
• Both extend existing sense to new meaning
– Metaphor: completely different concept
– Metonymy: related concepts
Sum
• Many definable word relations useful to NLP in
different ways
–
–
–
–
Homonymy, polysemy, synonymy, hypernymy
Homography, homophony
Metaphor, metonymy
Collocations
• Resources available to aid in processing
– WordNet, FrameNet, online dictionaries,….
• A Huge Problem for NLP?
Ambiguity and Word Sense Disambiguation
• Recall: For semantic attachment approaches: what
happens when a given lexeme has multiple ‘meanings’?
Flies [V] vs. Flies [N]
He robbed the bank. He sat on the bank.
• How do we determine the correct sense of the word?
• Machine Learning
– Supervised methods
– Lightly supervised and Unsupervised Methods
• Bootstrapping
• Dictionary-based techniques
• Selectional Association
Supervised WSD
• Approaches:
– Tag a corpus with correct senses of particular words
(lexical sample) or all words (all-words task)
• E.g. SENSEVAL corpora
– Lexical sample:
• Extract features which might predict word sense
– POS? Word identity? Punctuation after? Previous word?
Its POS?
• Use Machine Learning algorithm to produce a
classifier which can predict the senses of one word
or many
– All-words
• Use semantic concordance: each open class word
labeled with sense from dictionary or thesaurus
– E.g. SemCor (Brown Corpus), tagged with WordNet
senses
What Features Are Useful?
• “Words are known by the company they keep”
– How much ‘company’ do we need to look at?
– What do we need to know about the ‘friends’?
• POS, lemmas/stems/syntactic categories,…
• Collocations: words that frequently appear with the
target, identified from large corpora
federal government, honor code, baked potato
– Position is key
• Bag-of-words: words that appear somewhere in a
context window
I want to play a musical instrument so I chose the bass.
– Ordering/proximity not critical
• Punctuation, capitalization, formatting
Rule Induction Learners and WSD
• Given a feature vector of values for independent variables
associated with observations of values for the training set
• Top-down greedy search driven by information gain: how
will entropy of (remaining) data be reduced if we split on
this feature?
• Produce a set of rules that perform best on the training
data, e.g.
– bank2 if w-1==‘river’ & pos==NP & src==‘Fishing News’…
– …
• Easy to understand result but many passes to achieve each
decision, susceptible to over-fitting
Naïve Bayes
arg max
sS p(s|V),
p(V |s) p(s)
p(V )
• ŝ=
or
• Where s is one of the senses S possible for a
word w and V the input vector of feature values
for w
• Assume features independent, so probability of V
is the product of probabilities of each feature,
given s, so
n
p(V | s)   p(v j | s)
•
p(V) same for any ŝ
arg max
sS
j 1
• Then
n
sˆ  arg max p(s)  p(v j | s)
j 1
sS
• How do we estimate p(s) and p(vj|s)?
– p(si) is max. likelihood estimate from a sense-tagged
corpus (count(si,wj)/count(wj)) – how likely is bank to
mean ‘financial institution’ over all instances of bank?
– P(vj|s) is max. likelihood of each feature given a
candidate sense (count(vj,s)/count(s)) – how likely is
the previous word to be ‘river’ when the sense of bank
is ‘financial institution’
• Calculate sˆ  arg max p(s) n p(v j | s) for each possible
j 1
sS
sense and
take the highest
scoring sense as the most likely choice
Decision List Classifiers
• Transparent
• Like case statements applying tests to input in turn
fish within window
--> bass1
striped bass
--> bass1
guitar within window
--> bass2
bass player
--> bass1
– Yarowsky ‘96’s approach orders tests by individual
accuracy on entire training set based on log-likelihood


ratio
 P(Sense1| f v j 
i
 
Abs(Log 

 P(Sense 2| f

i v j 
Lightly Supervised Methods: Bootstrapping
• Bootstrapping I
– Start with a few labeled instances of target item as
seeds to train initial classifier, C
– Use high confidence classifications of C on unlabeled
data as training data
– Iterate
• Bootstrapping II
– Start with sentences containing words strongly
associated with each sense (e.g. sea and music for
bass), either intuitively or from corpus or from
dictionary entries, and label those automatically
– One Sense per Discourse hypothesis
Dictionary Approaches
• Problem of scale for all ML approaches
– Building a classifier for each word with multiple senses
• Machine-Readable dictionaries with senses
identified and examples
– Simplified Lesk:
• Retrieve all content words occurring in context of
target (e.g. Sailors love to fish for bass.)
– Compute overlap with sense definitions of target entry
» bass1: a musical instrument…
» bass2: a type of fish that lives in the sea…
bass1
/beɪs/ Pronunciation Key - Show Spelled Pronunciation[beys]
Pronunciation Key - Show IPA Pronunciation Music.
–adjective 1.low in pitch; of the lowest pitch or range: a bass voice; a bass
instrument. 2.of or pertaining to the lowest part in harmonic music. –
noun 3.the bass part. 4.a bass voice, singer, or instrument. 5.double
bass.
[Origin: 1400–50; late ME, var. of base2 with ss of basso ]
bass2
/bæs/ Pronunciation Key - Show Spelled Pronunciation[bas]
Pronunciation Key - Show IPA Pronunciation
–noun, plural (especially collectively ) bass, (especially referring to two
or more kinds or species ) bass·es. 1.any of numerous edible, spinyfinned, freshwater or marine fishes of the families Serranidae and
Centrarchidae. 2.(originally) the European perch, Perca fluviatilis.
[Origin: 1375–1425; late ME bas, earlier bærs, OE bærs (with loss of r
before s as in ass2, passel, etc.); c. D baars, G Barsch, OSw agh-borre
]
– Choose sense with most content-word overlap
– Original Lesk:
• Compare dictionary entries of all content-words in
context with entries for each sense
– But….dictionary entries are short
• Expand with entries of ‘related’ words that appear in
the original entry
• If tagged corpus available, collect all the words
appearing in context of each sense of target word
– e.g. all words appearing in sentences with bass1 added to
signature for bass1
– Weight each by frequency of occurrence of word with that
sense tagged in corpus (e.g. all senses of bass) to capture
how discriminating a word is for the target word’s senses
– Corpus Lesk performs best of all Lesk approaches
Disambiguation via Selectional Restrictions
• “Verbs are known by the company they keep”
– Different verbs select for different thematic roles
wash the dishes (takes washable-thing as patient)
serve delicious dishes (takes food-type as patient)
• Method: another semantic attachment in grammar
– Semantic attachment rules are applied as sentences are
syntactically parsed, e.g.
VP --> V NP
V serve <theme> {theme:food-type}
– Selectional restriction violation: no parse
• But this means we must:
– Write selectional restrictions for each sense of each
predicate – or use FrameNet
• Serve alone has 15 verb senses
– Obtain hierarchical type information about each
argument (using WordNet)
• How many hypernyms does dish have?
• How many words are hyponyms of dish?
• But also:
– Sometimes selectional restrictions don’t restrict enough
(Which dishes do you like?)
– Sometimes they restrict too much (Eat dirt, worm! I’ll
eat my hat!)
• Can we take a statistical approach?
Selectional Association (Resnik ‘97)
•
Selectional Preference Strength: how much does a
predicate tell us about the word class of its argument?
George is a monster, George cooked a steak
– SR(v): How different is p(c), the probability that any direct object
will be a member of some class c, from p(c|v), the probability that
a direct object of a specific verb will fall into that class?
1. Estimate conditional probabilities of word senses from a parsed
corpus, counting how often each predicate occurs with an object
argument
1. e.g. How likely is dish to be an object of served?
1. Jane served/V the dish/Obj
2.
Then estimate the strength of association between each predicate
and the super-class (hypernym) of the argument in Wordnet
– E.g. For each object x of serve (e.g. ragout, Mary, dish)
• Look up all x’s hypernym classes in WordNet (e.g
dish isa piece of crockery, dish isa food item, ragout
isa food item, Mary isa person…)
• Distribute “credit” for each of x’s senses occurring
with serve among all hypernym classes (≈sense) to
which x belongs (1/n for n classes)
– Pr(c|v) is estimated at count(c,v)/count(v)
– Why does this work?
• Ambiguous words have many superordinate classes
John served food/the dish/tuna/curry
• The most common sense across all objects of the
verb should eventually dominate the likelihood
score
– How can we use this in wsd?
• Choose the class (sense) of the direct object with the
highest probability, given the verb
Mary served the dish proudly.
• Results:
– Baselines:
• random choice of word sense is 26.8%
• choose most frequent sense (NB: requires senselabeled training corpus) is 58.2%
– Resnik’s: 44% correct from corpus only pred/arg
relations labeled
Evaluating WSD
• In vivo/end-to-end/task-based/extrinsic vs. in vitro/standalone/intrinsic: evaluation in some task (parsing? q/a? IVR
system?) vs. application independent
– In vitro metrics: classification accuracy on held-out test set or
precision/recall/f-measure if not all instances must be labeled
• Baseline:
– Most frequent sense?
– Lesk algorithms
• Ceiling: human annotator agreement
Summing Up
• Word relations: how can we identify different
types?
• Disambiguating among word senses
• Next time: Ch 17: 3-5