Transcript Slides

A Hybrid Relational Approach for Word
Sense Disambiguation in Machine
Translation
Lucia Specia
Mark Stevenson
Maria G. V. Nunes
WSD in Machine Translation (MT)

Lexical choice in the case of semantic ambiguity.

Examples (English-Portuguese):

take =
tomar (carry out),
levar (lead, direct, conduct, guide),
aceitar (accept),
pegar (choose, pick out), etc.
WSD in Machine Translation (cont.)

One of the main challenges in MT.

Conflicting results on the usefulness of
WSD for (statistical) MT:
(Vickrey et al., 2005);
 (Carpuat and Wu, 2005).


Particularly for English-Portuguese, studies have shown
that the lack of WSD modules is one of the main
reasons for the unsatisfactory results of the existent MT
systems

We suggested that an effective WSD module, specifically
designed for MT, would improve MT performance.
Approaches to WSD
 Knowledge-based: linguistic knowledge
codified or extracted from lexical resources
manually
Accurate, but suffer from the knowledge acquisition bottelneck.
 Corpus-based: knowledge automatically acquired from
text using machine learning
Wide coverage, but need consistent and significant sample corpus.
 Hybrid: merge
approaches
characteristics
of
the
two
other
Explore advantages and minimize limitations of other approaches
→ wide coverage and accurate results.
Approaches to multilingual WSD

Approaches to WSD as an application-independent task
date back to 1960’s.

Most are monolingual, for English disambiguation:

WSD is application-dependent (Wilks and Stevenson,
1998; Kilgarriff, 1997; Resnik and Yarowsky, 1997).

WSD for MT differs from monolingual WSD (Hutchins and
Sommers, 1992), particularly with respect to the sense
repository (Specia et al., 2006).
Approaches to multilingual WSD

Corpus-based and hybrid approaches use propositional
formalisms (attribute-value vectors):

Limited expressiveness; data sparseness:
Ex1) John gave Mary a big cake.
Ex2) Give me something.
verb1-subj1 verb1-obj1
give-john

mod1-obj1 verb1-obj2 …
give-cake
big-cake
give-something
give-mary …
give-me
…
Consequences: Impractical to represent substantial
knowledge and use it during the learning process

Hybrid approaches use knowledge in pre-processing steps,
before applying machine learning algorithms.
Proposal – a novel approach

LeAR (Lexical Ambiguity Resolution):

Specific for MT: senses, knowledge, techniques.

Hybrid - corpus and knowledge-based



Relational formalism


Several knowledge sources (KSs) automatically acquired
from corpus and lexical resources;
Evidence provided by examples of disambiguation extracted
from automatically created sense tagged corpora.
Highly expressive, avoiding data sparseness:
each example is represented independently.
Inductive Logic Programming (ILP)
Relational symbolic supervised learning approach.
Inductive Logic Programming
Examples
Back. Knowledge
(1st-order clauses)
(1st-order clauses)
Machine Learning
ILP
Logic Programming
Theory
(1st-order clauses)

Aleph
Allows the efficient representation of substantial
knowledge about the problem, and allows this
knowledge to be used during the learning process
(Muggleton, 1991).
Inductive Logic Programming (cont.)

Given:



a set of positive and negative examples E = E+  Ea predicate p specifying the target relation to be learned
knowledge  of a certain domain which specifies which
predicates qi can be part of the definition of p.

The goal is: to induce a hypothesis (or theory) h for p,
with respect to E and , which covers most of the E+,
without covering the E-.

Additionally: clauses representing K, E, and h must
satisfy a set of syntactic restrictions S (language bias).

h can be used to classify new cases of disambiguation.
Inductive Logic Programming (cont.)

Aleph (Srinivasan, 2000):

Provides a complete relational learning inference engine.

Provides various customization options:




Induction methods;
Search strategies;
Evaluation functions; etc.
We are using:




bottom-up search (generalisation);
non-incremental learning (batch learning);
non-interactive learning (without user intervention);
learning based on positive examples only.
Inductive Logic Programming (cont.)

The default inference engine induces
iteratively by means of the following steps:
1.
a
theory
One example is selected to be generalized.
Ex.: sense(sent1,voltar).
1.
A more specific clause (bottom clause), which explains
the selected example, is built. It usually consists of the
representation of all knowledge about that example.
2.
A clause that is more generic than the bottom clause is
searched, by means of different search, evaluation, and
generalization strategies.
3.
The best clause found is added to the theory and the
examples covered by such clause are removed from the
example set. If there are more instances, return to step 1.
Parser
POS of the Narrow
Context (10)
POS
tagger
Examples
Rules to use POS
Bag-of-words (10)
Mode + type + general
settings
Rules to use Bag-ofwords (10)
KS2
Subject-object
syntactic relations
KS1
ILP Inference
Engine
Rules to use syntactic
relations
Rule-based
model
KS3
11 Collocations
Subject-object
syntactic relations
Verbs selectional
restrictions
Rules to use
definitions overlapping
Bag-of-words (10)
Rules to use
Collocations
Rules to use
selectional restrictions
Definitions overlapping
KS7
KS4
Overlapping counting
Nouns semantic
features
LDOCE
Hierarchical
relations
Wordnet
Bag-of-words (10)
KS5
Feature types
hierarchy
Bilingual MRDs
Verb definitions and
examples
Rules to use context,
ph. verbs & idioms
LDOCE +
Password
Phrasal verbs and
idioms
Bag-of-words (200)
KS6
Scope

Experiments with:

English-Portuguese MT


10 highly frequent and ambiguous verbs


Relevant and difficult cases for English-Portuguese MT
(Specia, 2005a).
Knowledge from syntactic, semantic and pragmatic
sources


No studies have examined English-Portuguese.
Working on knowledge which is specific for translation.
Although especially designed for MT of verbs, the
approach can be adapted for WSD of any words and
languages.
Sample data
Verb
5
11
17
5
11
8
209
183
157
180
197
209
Accuracy of most
frequent translation
0.77
0.5
0.21
0.89
0.69
0.69
7
191
0.5
make
11
170
0.7
take
13
142
0.29
tell
17
209
0.35
ask
come
get
give
go
live
look
Average

# Translation # Examples
10.5
0.56
Corpus: fiction books, automatically tagged with the verb
translation and manually reviewed (Specia et al., 2005a).
Knowledge sources

Example: sent1, verb “to come”:
“If there is such a thing as reincarnation, I would not mind
coming back as a squirrel”.

KS1: Bag-of-words – ± 5 words (lemmas) surrounding the
verb for every sentence (sent_id)
bag(sent_id, list_of_words).
Ex.: bag(sent1, [mind,not,will,i,reincarnation,back,as,a,squirrel])

KS2: Part-of-speech (POS) tags of content words in a ±5
word window surrounding the verb
has_pos(sent_id, word_position, pos).
Ex.: has_pos(sent1, word_left_1, nn).
has_pos(sent1, word_left_2, vbp).
…
Knowledge sources

KS3: Subject and object syntactic relations with respect to
the verb
has_rel(sent_id, subject_word, object_word).
Ex.: has_rel(sent1, i, nil).

KS4: Context words represented by 11 collocations with
respect to the verb: 1st preposition to the right, 1st and
2nd words to the left and right, 1st noun, 1st adjective, and
1st verb to the left and right
has_collocation(sent_id, collocation_type, collocation).
Ex.: has_collocation(sent1, word_right_1, back).
has_collocation(sent1, word_left_1, mind).
…
Knowledge sources

KS5: Selectional restrictions of verbs and semantic
features of their arguments from LDOCE
rest(verb, subj_restrition, obj_ restriction, translation)
Ex.: rest(come, [], nil, voltar).
rest(come, [animal,human], nil, vir).
rest(come, [], nil, aparecer).
...
feature(noun, sense_id, features).
Ex.: feature(reincarnation, 0_1, [abstract]).
feature(reincarnation, 0_2, [animate]).
feature(squirrel, 0_0, [animal]).
…
Knowledge sources

KS5 (cont.):

Hierarchy for LDOCE feature types (Bruce and Guthrie,
1992)
relation(feature1, feature2).
Ex.: sub(human, animate).
…

Ontological relations from WordNet
relation(word1, sense_id1, word2, sense_id2).
Ex.: hyper(reincarnation, 1, avatar, 1).
hyper(reincarnation, 3, religious_doctrine, 2).
synon(rebirth, 2, reincarnation, -1).
…
Knowledge sources

KS6: Idioms and phrasal verbs
exp(verbal_expression, translation)
Ex.: exp('come about', acontecer).
exp('come about', chegar).
exp('come to fruition', amadurecer).

…
KS7: A count of the overlapping words in dictionary
definitions for the possible translations of the verb and the
words surrounding it in the sentence
highest_overlap(sent_id, translation, overlapping).
Ex.: highest_overlap(sent1, voltar, 0.222222).
highest_overlap(sent1, chegar, 0.0857143).
…
Additional predicates

Examples:
sense(sent_id, translation).
Ex.: sense(sent1, voltar).
sense(sent2, ir).

…
Mode definitions
Ex.: :- modeh(1, sense(sent, translation)).
:- modeb(11, has_collocation(sent, colloc_id, colloc)).
:- modeb(10, has_bag(sent, word)).
…

Auxiliary predicates:
Ex.: has_bag(Sent, Word) :bag(Sent, List), member(Word, List).
…
bag(sent1, [mind,not,will,i,reincarnation,back,as,a,squirrel])
Example of rules produced

Verb “to come”:
1. sense(A, sair) :has_collocation(A, preposition_right_1, out).
2. sense(A, chegar) :satisfy_restrictions(A, [animal,human],[concrete]),
In order classify'come
new cases,
has_expression(A,
at').
rules must be applied in
3. sense(A, vir)the
:- order they are produced.
satisfy_restriction(A, [human],[abstract]);
has_collocation(A, word_right_1, from);
(has_rel(A, subj, B),
(has_pos(B,nn);has_pos(B,pron))).
4. sense(A, passar) :(has_bag(A, to), has_bag(A, propernoun));
highest_overlapping(A,passar).
Evaluation

Induction methods:
induce: builds one clause each time, removing the
examples covered by that clause. Theory to be produced
depends on the order of the examples;
 induce_max: builds one clause each time, without
removing the examples covered by the clause. Builds a
bottom clause for all the examples, and not only for the
first one.


Search strategies:
bf: enumerates shorter clauses before longer ones;
 df: enumerates longer clauses before shorter ones;
 heuristic: enumerates clauses in a best-first manner.

Evaluation

Generalization strategy:


Evaluation function:


Relative least general generalisation (rlgg): lgg of two
clauses c1 and c2, which is the minimum upper bound of c1
and c2 in the lattice introduced by -subsumption, with
relation to the background knowledge.
Only positive examples (Bayesian score).
Knowledge sources:
All 7;
 All – 1 = 6 each time;
 1 each time.

Comparison: propositional approaches

Algorithm: C4.5, Naive Bayes, Memory-based, SVM.

Knowledge sources:
Narrow context: 5 surrounding words and/or POS tags;
 Broad context: 1-100 surrounding words and/or POS tags;
 11 collocations: 1st preposition to the right, 1st and 2nd
words to the left and right, 1st noun, 1st adjective, and 1st
verb to the left and right;
 Subject-object syntactic relations.


Best combination of these features, along with use of
filters and optimization of parameters.

Best algorithm: SVM.
Specia et al. (2005b).
Results
Verb
ask
come
get
give
go
live
look
make
take
tell
# Rules
in Aleph
% Accuracy
Aleph*
5
11
17
6
12
8
6
11
11
8
0.94
0.91
0.67
0.96
0.88
0.88
0.83
0.83
0.81
0.94
Most frequent
sense
0.77
0.5
0.21
0.89
0.69
0.69
0.5
0.7
0.29
0.35
Average
0.87
0.56
C4.5
SVM
0.76
0.68
0.53
0.9
0.81
0.61
0.83
0.77
0.45
0.66
0.93
0.8
0.66
0.96
0.85
0.81
0.87
0.87
0.6
0.7
0.7
0.8
* 10 fold-cross validation, best experimental setting: induce_max, all KSs,
heuristic search, but without filters or other optimizations.
Results - KSs
All KSs except:
Verb All KSs
ask
0.94
come
0.91
get
0.67
give
0.96
go
0.88
live
0.88
look
0.83
make
0.83
take
0.81
tell
0.94
Aver.
0.87



Bag-ofwords
0.64
0.8
0.56
0.93
0.76
0.81
0.77
0.71
0.41
0.87
0.73
Collocations Expressions Definition POS Selectional Syntactic
Overlapping
Restrictions relations
0.9
0.82
0.46
0.8
0.76
0.67
0.62
0.83
0.47
0.75
0.71
0.64
0.78
0.56
0.93
0.84
0.81
0.77
0.71
0.41
0.85
0.73
0.64
0.8
0.56
0.93
0.84
0.81
0.77
0.71
0.41
0.87
0.73
0.76
0.82
0.54
0.91
0.84
0.77
0.77
0.88
0.44
0.85
0.76
0.64
0.87
0.62
0.93
0.76
0.81
0.68
0.76
0.41
0.85
0.73
0.68
0.78
0.56
0.84
0.84
0.81
0.79
0.74
0.47
0.87
0.74
All KSs together yield better results than subsets of KSs.
Different KSs seem to be more relevant than others for certain verbs.
One KS each time – very low quality results (accuracy and rules).
Next steps

Try different corpora: larger and of different
domains/genres (although the verbs are not domain
specific).

Other ILP options:




Induction methods;
Manual pruning;
Manual constraints;
Search strategies; etc.

Optimization (time).

Use of the translation context  KS specific to MT.

Extrinsic evaluation: transfer rule-based MT system.
Final remarks

Results are promising: hybrid relational approach
outperforms propositional approaches, yielding a small
set of symbolic rules, which are easy to understand and
adapt, if necessary.

All KSs seem to play an important role.

In general, the approach showed to be
feasible and we expect the resultant
system will be able to improve
the quality of English-Portuguese MT
systems.
“Which of you shall we say doth love us most?
That we our largest bounty may extend
Where nature doth with merit challenge.”
Lucia Specia
[email protected]
References







Bruce, R. and Guthrie. L. (1992). Genus disambiguation: A study in weighted
performance. In Proceedings of the 14th COLING, Nantes, pp. 1187-1191. Carpuat, M.
and Wu, D. (2005). Word sense disambiguation vs. statistical machine translation. 43rd
ACL Meeting, Ann Arbor, pp. 387–394.
Kilgarriff, A. (1997). I Don't Believe in Word Senses. Computers and the Humanities, 31
(2), pp. 91-113.
Hutchins, W.J. and Somers, H.L. 1992. An Introduction to Machine Translation.
Academic Press, Great Britain.
Muggleton, S. 1991. Inductive Logic Programming. New Generation Computing, 8
(4):295-318.
Resnik, P. and Yarowsky, D. (1997). A Perspective on Word Sense Disambiguation
Methods and their Evaluating. ACL-SIGLEX Workshop Tagging Texts with Lexical
Semantics: Why, What and How?. Washington.
Specia, L. (2005a). A Hybrid Model for Word Sense Disambiguation in EnglishPortuguese Machine Translation. In Proceedings of the 8th CLUK, Manchester, pp. 7178.
Specia, L. (2005b). Knowledge sources for disambiguating highly ambiguous verbs in
machine translation. In Proceedings of the Student Session of the 17th ESSLLI,
Edinburgh.
References





Specia, L., Nunes, M.G.V., Stevenson, M. (2005). Exploiting Parallel Texts to Produce a
Multilingual Sense-tagged Corpus for Word Sense Disambiguation. In Proceedings of
RANLP-05, Borovets, pp. 525-531.
Specia, L., Nunes, M.G.V., Stevenson, M. 2006 (to appear). Multilingual versus
Monolingual WSD. Proceedings of EACL Workshop Making Sense of Sense, April 4th,
Trento.
Srinivasan, A. 2000. The Aleph Manual. Technical Report. Computing Laboratory, Oxford
University (http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/).
Vickrey, D., Biewald, L., Teyssier, M., and Koller, D. (2005). Word-Sense Disambiguation
for Machine Translation. HLT/EMNLP, Vancouver.
Wilks, Y. and Stevenson. M. 1998. The Grammar of Sense: Using Part-of-speech Tags
as a First Step in Semantic Disambiguation. Journal of Natural Language Engineering,
4(1):1-9.