Pre-reordering

Download Report

Transcript Pre-reordering

Machine Translation
Course 8
Diana Trandabăț
Academic year: 2016-2017
Reordering
• Different languages put words in different order
• Computational Problem
– during translation we have to compose an output sentence
– when composing a sentence of n words, there are n!
possible orderings
– exponential spaces are hard to search
• Linguistic Problem
– how do different languages define the word order?
– how can we model reordering transformations during
translation?
Reordering
• Reordering remains a hard problem for SMT and motives new
models
• Reordering is the most important feature for predicting
translation performance (on indo-european languages)
Bag of words
the nobody security Swiss account the of will bank trust
nobody will trust the security of the Swiss bank account
• From New York Times
• Can you put the words in the correct order?
– Does a good language model suffice?
The Language Model/Translation
Model Debate
• Language model argument
– Humans can make sense of scrambled sentences,
machines should too
– Build really big language models to determine what makes
sense in the target language
• Translation model argument
– structure in source sentence defines relationship between
words
– we can learn how to map this structure into a targetlanguage structure
Example
German:
Ich halte eine Vorlesung über die Reihenfolge von Wörten
English:
I teach a
class
about the
order
Monotone reordering (no reordering)
of
words
Example
French:
Barack
Obama
English:
Barack Obama
vit
dans
la
Maison
lives
in
the
White
Adjective/noun swapping
Blanche
House
Example
German:
Ich muss eine Vorlesung über die Reihenfolge von Wörten halten
English:
I must teach a class about the
German main verb movement
order
of
words
Example
English:
we have taken these considerations very much into account when we designed the budget
French:
lorsque nous avons préparé le budget
,
nous avons pris cela en considération
Clause reordering, zero alignment of the comma
“Free” word order
• The following German sentences mean the same:
– Der Mann gibt der Frau das Buch.
– Das Buch gibt der Mann der Frau.
– Der Frau gibt der Mann das Buch.
– Der Mann gibt das Buch der Frau.
– Das Buch gibt der Frau der Mann.
– Der Frau gibt das Buch der Mann.
Amount of reordering (in a couple of
European languages)
From (Birch et al., EMNLP2008)
Amount of reordering
Italian
it
Swedish
sv
English
en
Greek
el
Portuguese pt
Danish
da
Spanish
es
French
fr
*Dutch
nl
*Finnish fi
*German de
* hard languages for reordering
Reordering models
•
•
•
•
•
Distortion
Distance-based models
Lexicalised models
Pre-ordering
Clause reordering
Reordering models
•
•
•
•
•
Distortion
Distance-based models
Lexicalised models
Pre-ordering
Clause reordering
Distortion
• Corresponding words do not appear in the same order in
source and target languages
• Assign an offset to move words from their original
positing to their final position
• In IBM model 3, the distortion factor determines how
likely it is that an English word in position i aligns to a
foreign word in position j, given the lengths of both
sentences:
• Note, positions are absolute positions
Distortion (Reordering)
• Making the offset dependent on all the words would be too
costly… so in IBM model 3, the offset only depends:
• on the position of the word within the sentence!!!
• the length of the sentences in both languages
• P(offset=o | Position = p, SourceLength = m,
ForeignLength = n)
– ex: chien brun --> brown dog
• brown --> offset -1 dog --> offset +1
– offset of brown = P(offset| 1,2,2)
– ex: P(+1| 1,2,2) = 0.3 P(0| 1,2,2) = 0.6 P(-1| 1,2,2) = 0.1
An Example (en|fr)
e*  argmax P(f| e) x P(e)
e
Given the
English
The
brown
dog
did
not
go
home
1
1
1
1
2
1
3
The
brown
dog
did
not
not
go
home
home
home
Translation
Model
Le
brun
chien
est
n'
pas
allé
à
la
maison
Offset Model
0
+1
-1
+1
-1
0
0
0
0
0
A possible
Translation
Le
chien
brun
n'
est
pas
allé
à
la
maison
Fertility
Model
Transformed
English
• Then use Language Model P(e) to evaluate fluency of all possible
translations
Distortion Probabilities
• The translation model includes probabilities for “distortion”
– Ex.: P(5|2,4,6)
– the probability that ws in position 2 (2nd word in the
source language) will produce a wt in position 5 when the
source sentence has 4 words and the target sentence has 6
words.
Distortion Probabilities
• For phrase-based MT: measure distortion of phrase i as
the distance between the start of the f phrase generated
by ēi, ai and the end of the phrase generated by the
previous phrase ēi-1 with its alignments ai-1
• Where:
–
–
–
–
f phrase is the phrase in the source language
ēi is the phrase in the target language corresponding fo f
ai is the alignment of phrase i
ēi-1 is the previous phrase in the target language
• Typically assume the probability of a distortion decreases
exponentially with the distance of the movement.
Set 0<<1 based on fit to phrase-aligned training data
Then set c to normalize the distortion of our phrase i,
d(i) , so it sums to 1.
19
Reordering models
•
•
•
•
•
Distortion
Distance-based models
Lexicalised models
Pre-ordering
Clause reordering
Distance Model
• Simple reordering model
– Cost of a reordering depends only on the distance of the
reordering
– Distribution can be estimated from alignment
– Or just a Gaussian with mean 1
– Or log p( aj | aj-1, I) = aj – aj-1 i.e. reordering cost
proportional to distance
Distance-based Reordering
• Distance-based reordering models have a simplistic view of
language
– Consider how many words a phrase moves (e.g. a maximum of 6
words)
• In reality, movement depends on the words
– German: modal verbs require moving the main verb to the end
of the sentence
– French: adjectives require moving the noun before the adjective
• Use the aligned training corpus to estimate a model
• These models do not generalize well since their parameters
are tied to absolute word position within sentences which
tend to be different for the same words across sentences.
Relative distortion model
– defined in terms of the difference between the position of
the current phrase and the position of the previous phrase
in the source sentence.
– Or, assigning a probability distribution over possible
relative jumps conditioned on source words.
– Conditioning on the source words allows for a much more
fine-grained model. For instance, words that tend to act as
modifers (e.g., adjectives) would have a different
distribution than verbs or nouns => this lead to lexicalized
reordering models
Reordering models
•
•
•
•
•
Distortion
Distance-based models
Lexicalised models
Pre-ordering
Clause reordering
Lexicalized Reordering Models
• Also called Block Distortion model
– Used in Moses SMT system
– Condition on actual words
– Different possibilities:
• Condition on source words vs target words
• Condition on words at start of jump (O - outbound) vs words at
landing point (I - inbound)
p(a j  a j 1 | f j , f j 1 , ea j , ea j1 , I )
F
E
Where:
aj-aj-1 is the offset
fj is the foreign word at position j
fj-1 is the previous foreign word
ea jis the source word aligned with
foreign word at position j
ea j1 is the source word aligned with the
previous foreign word
I or O stands for inbound (end of jump)
or outbound (start of jump)
Lexicalized model
• Three types of
reordering:
– monotone (m)
– swap (s)
– discontinuous (d)
Block Distortion Model
• Given current block, look at links at the corners
• Top: how did I come from previous phrase?
• Bottom: how do I continue to next phrase?
F
Left
Top
Right
Top
Current
Block
Current
Block
E
Left
Bottom
Previous
Block
Right
Bottom
Next
Block
Block Distortion Model
• Top-Left: prev-to-current = monotone
F
Left
Top
Previous
Block
Current
Block
E
Block Distortion Model
• Top-Right: prev-to-current = swap
F
Right
Top
Current
Block
E
Previous
Block
Block Distortion Model
• Neither top-left nor top-right: prev-to-current
= disjoint
F
Previous
Block
Current
Block
E
Block Distortion Model
• Bottom-Right: current-to-next = monotone
F
Current
Block
Next
Block
E
Block Distortion Model
• Bottom-Left: current-to-next = swap
F
Current
Block
Next
Block
E
Block Distortion Model
• Neither bottom-Left nor bottom-right: currentto-next = disjoint
F
Current
Block
Next
Block
E
Reordering models
•
•
•
•
•
Distortion
Distance-based models
Lexicalised models
Pre-ordering
Clause reordering
Extracting reordering rules
• Many types of reorderings are systematic
– move verb group together
– subject - verb - object
– move negation in front of verb
• Write rules by hand or try to derive them from annotated
data
Word reordering
• Trying all possible word reordering is an NP-Complete
problem, which makes searching for the optimal solution
among all possible permutations computationally intractable.
• Therefore, typically, the number of permutations considered
are limited for efficiency reasons by placing reordering
restrictions.
– allow only reordering of at most n words at any given time.
– using contiguity restrictions on the reordering.
Pre-reordering
• A different approach to allow for a limited reordering is to
reorder the input sentence such that the source and the
target sentences have similar word order and then proceed to
monotonically decode the reordered source sentence.
• Pre-reordering translates words in the same order they
appear in the target language.
– the input and output sentences have the same word order.
• It is very efficient since the optimal decoding can be found in
polynomial time.
Rewrite patterns
• A different method is to automatically acquire rewrite
patterns to be applied to any given input sentence so that the
rewritten source and target sentences have similar word
order.
– rewrite patterns can be automatically extracted by parsing
the source and target sides of the training parallel corpus,
i.e. using syntactic rules.
– Or, rewriting the input sentence can be done using
heuristics.
• Reordering is better handled during the search algorithm and
as part of the optimization function, rather than during the
decoding phase.
Word reorderings restrictions
• Allowing a restricted set of word reorderings:
– word reordering is done only for words belonging to the
verb group.
– word reordering based on reordering operations extracted
from syntactic parse trees.
Reordering models
•
•
•
•
•
Distortion
Distance-based models
Lexicalised models
Pre-ordering
Clause reordering
Clause Level Reordering
• Why clause structure?
– languages differ vastly in their clause structure (English:
SVO, Arabic: VSO, German: fairly free order; a lot details
differ: position of adverbs, sub clauses, etc.)
– large-scale restructuring is a problem for phrase models
• Restructuring
– reordering of constituents (main focus)
– add/drop/change of function words
Clause Structure
Example from (Koehn, 2006)
Syntax tree from German parser
Clause Reordering When Translating
• Reordering when translating into English
– tree is flattened
– clause level constituents line up
Clause structure
Clause level reordering is a well defined task
– label German constituents with their English order
Examples of translations improved
when using reordering
• we must also this criticism should be taken seriously .
→ we must also take this criticism seriously .
• i am with him that it is necessary , the institutional balance by
means of a political revaluation of both the commission and the
council to maintain .
→ i agree with him in this , that it is necessary to maintain the
institutional balance by means of a political revaluation of both the
commission and the council .
• perhaps it would be a constructive dialog between the government
and opposition parties , social representative a positive impetus in
the right direction .
→ perhaps a constructive dialog between government and opposition
parties and social representative could give a positive impetus in the
right direction .
“I’m trying to translate what my cat says and
put it in a book, but how many homonyms
are there for meow?”
Jarod Kintz
The Days of Yay are Here! Wake Me Up When
They're Over.