Syntax for MT

Download Report

Transcript Syntax for MT

Syntax for MT
EECS 767
Feb. 1, 2006
Outline
Motivation
 Syntax-based translation model

 Formalization
 Training

Using syntax in MT
 Using
multiple features
 Syntax-based features
The IBM Models

Word reordering
 Single
words, not groups
 Conditioned on position of words

Null-word insertion
 Uniform
across position
The Alignment Template Model

Word Reordering
 Phrases
can be reordered in any way, but
tend to stay in same order as source.
 Reordering within phrases defined by
templates

Word Translations
 Must
match up = No null
Implied Assumptions

Word Order
 Similar

to source sentence
Translation
 Near
1-1 correspondence
What goes wrong?

We see many errors in machine translation when
we only look at the word level
 Missing


MT: Condemns US interference in its internal affairs.
Human: Ukraine condemns US interference in its internal affairs.
 Verb


content words
phrase
MT: Indonesia that oppose the presence of foreign troops.
Human: Indonesia reiterated its opposition to foreign military
presence.
WS 2003 Syntax for Statistical Machine Translation Final Presentation
What goes wrong cont.
 Wrong
dependencies
MT: …, particularly those who cheat the audience
the players.
 Human: …, particularly those players who cheat
the audience.

 Missing
articles
MT: …, he is fully able to activate team.
 Human: …, he is fully able to activate the team.

WS 2003 Syntax for Statistical Machine Translation Final Presentation
What goes wrong cont.
 Word

salad:
the world arena on top of the u . s . sampla
competitors , and since mid – july has not
appeared in sports field , the wounds heal go back
to the situation is very good , less than a half hours
in the same score to eliminate 6:2 in light of the
south african athletes to the second round .
WS 2003 Syntax for Statistical Machine Translation Final Presentation
How can we improve?




Relying on language model to produce more ‘accurate’
sentences is not enough
Many of the problems can be considered ‘syntactic’
Perhaps MT-systems don’t know enough about what is
important to people
So, include syntax into MT


Build a model around syntax
Include syntax-based features in a model
WS 2003 Syntax for Statistical Machine Translation Final Presentation
A New Translation Story





You have a sentence and its parse tree
The children at each node in the tree are rearranged
New nodes may be inserted before or after a child node
These new nodes are assigned a translation
Each of the leaf lexical nodes is then translated
Yamada A Syntax-Based Statistical Translation Model Thesis 2002
A Syntax-based model
Assume word order is based on a
reordering of source syntax tree.
 Assume null-generated words happen at
syntactical boundaries.
 (For now) Assume a word translates into a
single word.

Yamada A Syntax-Based Statistical Translation Model Thesis 2002
Reorder
Yamada A Syntax-Based Statistical Translation Model Thesis 2002
Insert
Yamada A Syntax-Based Statistical Translation Model Thesis 2002
Translate
Yamada A Syntax-Based Statistical Translation Model Thesis 2002
Parameters

Reorder (R) – child node reordering
 Can
take any possible child node reordering
 Defines word order in translation sentence
 Conditioned on original child node order
 Only applies to non-leaf nodes
Yamada A Syntax-Based Statistical Translation Model Thesis 2002
Parameters cont.

Insertion (N) – placement and translation
 Left,
right, or none
 Defines word to be inserted
 Place conditioned on current and parent labels
 Word choice is unconditioned
Yamada A Syntax-Based Statistical Translation Model Thesis 2002
Parameters cont.

Translation (T) – 1 to 1
 Conditioned
only on source word
 Can take on null

Translation (T) – N to N
 Consider
word fertility (for 1-to-N mapping)
 Consider phrase translation at each node
 Limit size of possible phrases
 Mix phrasal w/ word-to-word translation
Yamada A Syntax-Based Statistical Translation Model Thesis 2002
Formalization
Set of nodes in parse tree
Total probability
Assume node independence
Assume random variables are
Independent of one another and
only dependent on certain features
Yamada A Syntax-Based Statistical Translation Model Thesis 2002
Training (EM)
1.
2.
3.
Initialize all probability tables (uniform)
Reset all counters
For each pair in the training corpus
Try all possible mappings of N,R, and T
B) Update the counts as seen in the mappings
A)
4.
5.
Normalize the probability tables with the new
counts
Repeat 2-4 several times
Yamada A Syntax-Based Statistical Translation Model Thesis 2002
Decoding





Modify original CFG with new reordering and their
probabilities
Add in VP->VP X and X -> word rules from N
Add lexical rules englishWord->foreignWord
Use the noisy-channel approach starting with a
translated sentence
Proceed through the parse tree using a bottom-up
beam search keeping an N-best list of good partial
translations for each subtree
Yamada&Knight A Decoder for Syntax-based Statistical MT 2002
Decoding cont.
Yamada&Knight A Decoder for Syntax-based Statistical MT 2002
Performance (Alignment)
Yamada A Syntax-Based Statistical Translation Model Thesis 2002
Performance (Alignment) cont.


Counting number of individual alignments
Perfect means all alignments in a pair are
correct
Yamada A Syntax-Based Statistical Translation Model Thesis 2002
Performance cont.

Chinese-English BLEU scores
Yamada&Knight A Decoder for Syntax-based Statistical MT 2002
Do we need the entire model to be
based on syntax?
Good performance increase
 Large computational cost

 Many
permutations to CFG rules (120K nonlexical)

How about trying something else?
 Add
syntax-based features that look for more
specific things
Using Syntax in MT

Multiple Features
 Formalization
 Baseline
 Training

Syntax-based Features
 Shallow
 Deep
Multiple Features (log-linear)
Calculate probability using
a variety of features
parameterized by an
associated ‘weight’
Find the translated sentence
which maximizes the feature
function with your foreign
sentence
JHU WS 2003 Syntax for Statistical Machine Translation Final Report
Baseline System
JHU WS 2003 Syntax for Statistical Machine Translation Final Report
Baseline System
JHU WS 2003 Syntax for Statistical Machine Translation Final Report
Baseline Features

Alignment template feature
 Uses
simple counts
JHU WS 2003 Syntax for Statistical Machine Translation Final Report
Baseline Features

Word selection feature

Uses lexicon probability estimated by relative frequency
Additional feature capturing word
reordering within phrasal alignments
JHU WS 2003 Syntax for Statistical Machine Translation Final Report
Baseline Features

Phrase alignment feature

Measure of deviation from monotone alignment
JHU WS 2003 Syntax for Statistical Machine Translation Final Report
Baseline Features

Language model feature
 Standard

backing-off trigram probability
Word/Phrase penalty feature
 Feature
counting number of words in translated sentence
 Feature counting number of phrases in translated sentence

Alignment lexicon feature
 Feature
counting the number of time something from a
given alignment lexicon is used
JHU WS 2003 Syntax for Statistical Machine Translation Final Report
A possible training method

Line optimization method
JHU WS 2003 Syntax for Statistical Machine Translation Final Report
Use reranking of N-best lists





Feature functions do not need to be integrated in dynamic
programming search
A feature function can arbitrarily condition itself on any part of
English/Chinese sentece/parse tree/chunks
Provides a simple software architecture
Using a fixed set of translations allows feature functions to be a
vector of numbers
You are limited to improvements you see within the N-best lists
WS 2003 Syntax for Statistical Machine Translation Final Presentation
Syntax-based Features

Shallow
 POS
and Chunk Tag counts
 Projected POS language model

Deep
 Tree-to-string
 Tree-to-tree
 Verb
arguments
JHU WS 2003 Syntax for Statistical Machine Translation Final Report
Shallow Syntax-Based Features

POS and chunk tag count



Low-level syntactic problems with baseline system. Too many
articles, commas and singular nouns. Too few pronouns, past
tense verbs, and plural nouns.
Reranker can learn balanced distributions of tags from various
features
Examples
 Number of NPs in English
 Difference in number of NPs between English and Chinese
 Number of Chinese N tags translated to only non-N tags in
English.
JHU WS 2003 Syntax for Statistical Machine Translation Final Report
Shallow Syntax-Based Features

Projected POS language model
 Use
word-level alignments to project Chinese POS
tags onto the English words



Possibly keeping relative position within Chinese phrase
Possibly keeping NULLs in POS sequence
Possibly using lexicalized NULLs from English word
 Use
the POS tags to train a language model based
on POS N-grams
JHU WS 2003 Syntax for Statistical Machine Translation Final Report
Deep Syntax-Based Features

Tree to string
 Uses
the Syntax-based model we saw previously
 Reduces computational cost by limiting size of
reorderings
 Add in a feature for probability as defined by the
model and the probability of the viterbi alignment
defined by the model
JHU WS 2003 Syntax for Statistical Machine Translation Final Report
Deep Syntax-Based Features

Tree to Tree
 Uses
tree transformation functions similar to those in
the tree-to-string model
 The probability of transforming a source tree into a
target tree is modeled as a sequence of steps
starting from the root of the target tree down.
JHU WS 2003 Syntax for Statistical Machine Translation Final Report
Tree to Tree cont.


At each level of the tree:
1.
At most one of the current node’s children is grouped with the
current node into a single elementary tree with its probability
conditioned on the current node and its children.
2.
An alignment of the children of the current elementary tree is
chosen with its probability conditioned on the current node an
the children of child in the elementary tree. This is similar to
the reorder operation in the tree-to-string model, but allows for
node addition and removal.
Leaf-level parameters are ignored when calculating probability of
tree-to-tree.
JHU WS 2003 Syntax for Statistical Machine Translation Final Report
Verb Arguments


Idea: A feature that counts the difference in the number
of arguments to the main verb between the Chinese and
English sentences
Perform a breadth-first search traversal of the
dependency trees


Mark the first verb encountered as the main verb
The number of arguments is equal to the number of its children
JHU WS 2003 Syntax for Statistical Machine Translation Final Report
Performance
Some things helped, some things didn’t
 Is syntax useful? Necessary?

References






K. Yamada and K. Knight. 2001. A syntax-based statistical translation model. In
ACL-01.
K. Yamada. 2002. A Syntax-Based Statistical Translation Model. Ph.D. thesis,
University of Southern California.
Yamada, Kenji and Kevin Knight. 2002. A decoder for syntaxbased MT. In Proc.
of the 40th Annual Meeting of the Association for Computational Linguistics
(ACL), Philadelphia, PA.
Franz Josef Och, Daniel Gildea, Sanjeev Khudanpur, Anoop Sarkar, Kenji
Yamada, Alex Fraser, Shankar Kumar, Libin Shen, David Smith, Katherine Eng,
Viren Jain, Zhen Jin, and Dragomir Radev. A smorgasbord of features for
statistical machine translation. In Proceedings of the Human Language
Technology Conference.North American chapter of the Association for
Computational Linguistics Annual Meeting, pages 161-168, 2004. MIT Press.
Franz Josef Och, Daniel Gildea, Sanjeev Khudanpur, Anoop Sarkar, Kenji
Yamada, Alex Fraser, Shankar Kumar, Libin Shen, David Smith, Katherine Eng,
Viren Jain, Zhen Jin, and Dragomir Radev. Final Report of the Johns Hopkins
2003 summer workshop on Syntax for Statistical Machine Translation.
Philipp Koehn, Franz Josef Och, and Daniel Marcu. Statistical phrase-based
translation. In Proceedings of the Human Language Technology
Conference/North American Chapter of the Association for Computational
Linguistics Annual Meeting, 2003. MIT Press.