chapter20part1

Download Report

Transcript chapter20part1

Chapter 20
Computational Lexical Semantics
1
Supervised Word-Sense Disambiguation
(WSD)
• Methods that learn a classifier from manually
sense-tagged text using machine learning
techniques.
– Classifier: machine learning model for classifying
instances into one of a fixed set of classes
• Treats WSD as a classification problem, where
a target word is assigned the most likely sense
(from a given sense inventory), based on the
context in which the word appears.
2
Supervised Learning for WSD
• Assume the POS of the target word is already
determined.
• Encode context using a set of features to be used
for disambiguation.
• Given labeled training data, encode it using these
features, and train a machine learning algorithm.
The result is a classifier.
• Use the trained classifier to disambiguate future
instances of the target word (test data), given their
contextual features (the same features)
3
Sense Tagged Text
Bonnie and Clyde are two really famous criminals, I think they were
bank/1 robbers
My bank/1 charges too much for an overdraft.
I went to the bank/1 to deposit my check and get a new ATM card.
The University of Minnesota has an East and a West Bank/2 campus right
on the Mississippi River.
My grandfather planted his pole in the bank/2 and got a great big catfish!
The bank/2 is pretty muddy, I can’t walk there.
Feature Engineering
• The success of machine learning requires
instances to be represented using an effective set
of features that are correlated with the categories
of interest.
• Feature engineering can be a laborious process
that requires substantial human expertise and
knowledge of the domain.
• In NLP it is common to extract many (even
thousands of) potential features and use a learning
algorithm that works well with many relevant and
irrelevant features.
5
Contextual Features
•
•
•
•
Surrounding bag of words.
POS of neighboring words
Local collocations
Syntactic relations
Experimental evaluations indicate that all of
these features are useful; and the best results
comes from integrating all of these cues in the
disambiguation process.
6
Surrounding Bag of Words
• Unordered individual words near the ambiguous
word (their exact positions are ignored)
• To create the features:
– Let BOW be an empty hash table
– For each sentence in the training data:
• For each word W within +-N words of the target word:
–
–
If W not in BOW: then BOW[W] = 0
BOW[W] += 1
– Let Fs be a list of the K most frequent words in BOW, excluding
“stop words”
• “Stop words”: pronouns, numbers, conjunctions, and other “function” words.
Standard lists of stop words are available
– Define K features for each sentence, one for each of the K words:
• Feature i is the number of Fs[i] appearing within +- N of the target word
7
Surrounding Bag of Words Features:
Example
• Example, disambiguating bass.n
• 12 most frequent content words from a collection of
bass.n sentences from the WSJ (J&M p. 641):
– [fishing,big,sound,player,fly,rod,pound,double,runs,pla
ying,guitar,band]
• “An electric guitar and bass player stand off to one
side, not really part of the scene, just as a sort of nod
to gringo expectations perhaps.”
• Features for that sentence: [0,0,0,1,0,0,0,0,0,0,1,0]
– In an arff file, these would be the values in 12 of the
feature (attribute) columns
8
Surrounding Bag of Words
• Idea? They are general topical cues of the context
(“global” features)
9
POS of Neighboring Words
• Use part-of-speech of immediately
neighboring words.
• Provides evidence of local syntactic context.
• P-i is the POS of the word i positions to the
left of the target word.
• Pi is the POS of the word i positions to the
right of the target word.
• Typical to include features for:
P-3, P-2, P-1, P1, P2, P3
10
POS of Neighboring Words
• “An electric guitar and bass player stand off
to one side, not really part of the scene, just
as a sort of nod to gringo expectations
perhaps.”
• Features for the sentence:
– [JJ,NN,CC,NN,VB,IN]
– 6 more feature/attribute columns in the arff file
11
Local Collocations
• Specific lexical context immediately adjacent to the word.
• For example, to determine if “interest” as a noun refers to
“readiness to give attention” or “money paid for the use of
money”, the following collocations are useful:
–
–
–
–
“in the interest of”
“an interest in”
“interest rate”
“accrued interest”
• Ci,j is a feature of the sequence of words from i to j relative
to the target word.
– C-2,1 for “in the interest of” is “in the of”
• Typical to include:
– Single word context: C-1,-1 , C1,1, C-2,-2, C2,2
– Two word context: C-2,-1, C-1,1 ,C1,2
– Three word context: C-3,-1, C-2,1, C-1,2, C1,3
12
Local Collocations
• Typical to include:
– Single word context: C-1,-1 , C1,1, C-2,-2, C2,2
– Two word context: C-2,-1, C-1,1 ,C1,2
– Three word context: C-3,-1, C-2,1, C-1,2, C1,3
• “An electric guitar and bass player stand off to one side, not
really part of the scene, just as a sort of nod to gringo
expectations perhaps.”
• Features for this sentence:
• [and,player,guitar,stand,guitar and,and player,player
stand,electric guitar and,guitar and player,and player
stand,player stand off] (11 more columns in arff)
• What’s the difference with the bag-of-words features?
• These features reflect position, and are N-grams (fixed
sequences). They more richly capture the local context of
the target word. Bag-of-words features, in contrast, are
more general clues of the topic.
13
Syntactic Relations
(Ambiguous Verbs)
• For an ambiguous verb, it is very useful to know
its direct object.
1.
2.
3.
4.
5.
6.
“played the game”
“played the guitar”
“played the risky and long-lasting card game”
“played the beautiful and expensive guitar”
“played the big brass tuba at the football game”
“played the game listening to the drums and the tubas”
• May also be useful to know its subject:
1. “The game was played while the band played.”
2. “The game that included a drum and a tuba was
played on Friday.”
14
Syntactic Relations
(Ambiguous Nouns)
• For an ambiguous noun, it is useful to know
what verb it is an object of:
– “played the piano and the horn”
– “poached the rhinoceros’ horn”
• May also be useful to know what verb it is
the subject of:
– “the bank near the river loaned him $100”
– “the bank is eroding and the bank has given the
city the money to repair it”
15
Syntactic Relations
(Ambiguous Adjectives)
• For an ambiguous adjective, it useful to
know the noun it is modifying.
1.
2.
3.
4.
“a brilliant young man”
“a brilliant yellow light”
“a wooden writing desk”
“a wooden acting performance”
16
Using Syntax in WSD
(per-word classifiers)
• Produce a parse tree for a sentence using a syntactic
parser.
S
NP
ProperN
John
VP
V
played
NP
DET
the
N
piano
• For ambiguous verbs, use the head word of its direct
object and of its subject as features.
• For ambiguous nouns, use verbs for which it is the
object and the subject as features.
• For ambiguous adjectives, use the head word (noun)
of its NP as a feature.
17
Syntactic Relations
(Ambiguous Verbs)
• Feature: head of direct object (special value null if none)
1.
2.
3.
4.
5.
6.
“played the game” game
“played the guitar” guitar
“played the risky and long-lasting card game” game
“played the beautiful and expensive guitar” guitar
“played the big brass tuba at the football game” tuba
“played the game listening to the drums and the tubas”
game
• Feature: head of subject (special value null if none)
1. “The game was played game while the band played
band.” (two instances of “played” in one sentence)
2. “The game that included a drum and a tuba was
played on Friday.” game
18
Syntactic Relations
(Ambiguous Nouns)
• Feature: Head verb that the target is the
object of
– “played the piano and the horn” played
– “poached the rhinoceros’ horn” poached
• Feature: Head verb that the target is the
subject of
– “the bank near the river loaned him $100”
loaned
– “the bank is eroding eroding and the bank has
given the city the money to repair it” given
19
Syntactic Relations
(Ambiguous Adjectives)
• Feature: Noun the adjective modifies
1.
2.
3.
4.
“a brilliant young man” man
“a brilliant yellow light” light
“a wooden writing desk” desk
“a wooden acting performance” performance
20
Summary: Supervised Methodology
• Create a sample of training data where a given target word is
manually annotated with a sense from a predetermined set of
possibilities.
– One tagged word per instance
• Select a set of features with which to represent context.
– co-occurrences, collocations, POS tags, verb-obj relations, etc...
• Convert sense-tagged training instances to feature vectors.
• Apply a machine learning algorithm to induce a classifier.
– Form – structure or relation among features
– Parameters – strength of feature interactions
• Convert a held out sample of test data into feature vectors.
– “correct” sense tags are known but not used
• Apply classifier to test instances to assign a sense tag.
Supervised Learning Algorithms
• Once data is converted to feature vector form, any
supervised learning algorithm can be used. Many
have been applied to WSD with good results:
–
–
–
–
–
–
–
–
–
Support Vector Machines
Nearest Neighbor Classifiers
Decision Trees
Decision Lists
Naïve Bayesian Classifiers
Perceptrons
Neural Networks
Graphical Models
Log Linear Models
Summary: Supervised WSD with
Individual Classifiers
• Many supervised Machine Learning algorithms have
been applied to Word Sense Disambiguation, most work
reasonably well.
– (Witten and Frank, 2000) is a great intro. to supervised
learning.
• Features tend to differentiate among methods more than
the learning algorithms.
• Good sets of features tend to include:
–
–
–
–
–
Co-occurrences or keywords
Collocations
Bigrams and Trigrams
Part of speech
Syntactic features
Convergence of Results
• Accuracy of different systems applied to the
same data tends to converge on a particular
value, no one system shockingly better than
another.
– Senseval-1, a number of systems in range of 7478% accuracy for English Lexical Sample task (a
small number of words, so it is feasible to develop one classifier per word)
– Senseval-2, a number of systems in range of 6164% accuracy for English Lexical Sample task.
– Senseval-3, a number of systems in range of 7073% accuracy for English Lexical Sample task…
Evaluation of WSD
• “In vitro”:
– Corpus developed in which one or more ambiguous words
are labeled with explicit sense tags according to some sense
inventory.
– Corpus used for training and testing WSD and evaluated
using accuracy (percentage of labeled words correctly
disambiguated).
• Use most common sense selection as a baseline.
• “In vivo”:
– Incorporate WSD system into some larger application
system, such as machine translation, information retrieval, or
question answering.
– Evaluate relative contribution of different WSD methods by
measuring performance impact on the overall system on final
task (accuracy of MT, IR, or QA results).
25