Learning Syntactic Transfer Rules

Download Report

Transcript Learning Syntactic Transfer Rules

Machine Translation Overview
Alon Lavie
Language Technologies Institute
Carnegie Mellon University
LTI Immigration Course
August 24, 2006
Machine Translation: History
• MT started in 1940’s, one of the first conceived
application of computers
• Promising “toy” demonstrations in the 1950’s, failed
miserably to scale up to “real” systems
• AIPAC Report: MT recognized as an extremely difficult,
“AI-complete” problem in the early 1960’s
• MT Revival started in earnest in 1980s (US, Japan)
• Field dominated by rule-based approaches, requiring
100s of K-years of manual development
• Economic incentive for developing MT systems for small
number of language pairs (mostly European languages)
March 24, 2006
LTI IC 2006
2
Machine Translation:
Where are we today?
• Age of Internet and Globalization – great demand for
MT:
– Multiple official languages of UN, EU, Canada, etc.
– Documentation dissemination for large manufacturers
(Microsoft, IBM, Caterpillar)
• Economic incentive is still primarily within a small
number of language pairs
• Some fairly good commercial products in the market for
these language pairs
– Primarily a product of rule-based systems after many
years of development
• Pervasive MT between most language pairs still nonexistent and not on the immediate horizon
March 24, 2006
LTI IC 2006
3
Example of Current Best MT
PAHO’s Spanam system:
•
Mediante petición recibida por la Comisión Interamericana de
Derechos Humanos (en adelante …) el 6 de octubre de 1997, el señor
Lino César Oviedo (en adelante …) denunció que la República del
Paraguay (en adelante …) violó en su perjuicio los derechos a las
garantías judiciales … en su contra.
•
Through petition received by the `Inter-American Commission on
Human Rights` (hereinafter …) on 6 October 1997, Mr. Linen César
Oviedo (hereinafter “the petitioner”) denounced that the Republic of
Paraguay (hereinafter …) violated to his detriment the rights to the
judicial guarantees, to the political participation, to // equal protection
and to the honor and dignity consecrated in articles 8, 23, 24 and 11,
respectively, of the `American Convention on Human Rights`
(hereinafter …”), as a consequence of judgments initiated against it.
March 24, 2006
LTI IC 2006
4
Core Challenges of MT
• Ambiguity:
– Human languages are highly ambiguous, and
differently in different languages
– Ambiguity at all “levels”: lexical, syntactic, semantic,
language-specific constructions and idioms
• Amount of required knowledge:
– At least several 100k words, at least as many
phrases, plus syntactic knowledge (i.e. translation
rules). How do you acquire and construct a
knowledge base that big that is (even mostly)
correct and consistent?
March 24, 2006
LTI IC 2006
5
How to Tackle the Core Challenges
• Manual Labor: 1000s of person-years of human
experts developing large word and phrase translation
lexicons and translation rules.
Example: Systran’s RBMT systems.
• Lots of Parallel Data: data-driven approaches for
finding word and phrase correspondences automatically
from large amounts of sentence-aligned parallel texts.
Example: Statistical MT systems.
• Learning Approaches: learn translation rules
automatically from small amounts of human translated
and word-aligned data. Example: AVENUE’s XFER
approach.
• Simplify the Problem: build systems that are limiteddomain or constrained in other ways. Examples:
CATALYST, NESPOLE!.
March 24, 2006
LTI IC 2006
6
State-of-the-Art in MT
• What users want:
– General purpose (any text)
– High quality (human level)
– Fully automatic (no user intervention)
• We can meet any 2 of these 3 goals
today, but not all three at once:
– FA HQ: Knowledge-Based MT (KBMT)
– FA GP: Corpus-Based (Example-Based) MT
– GP HQ: Human-in-the-loop (efficiency tool)
March 24, 2006
LTI IC 2006
7
Types of MT Applications:
• Assimilation: multiple source languages,
uncontrolled style/topic. General purpose MT,
no semantic analysis. (GP FA or GP HQ)
• Dissemination: one source language,
controlled style, single topic/domain. Special
purpose MT, full semantic analysis. (FA HQ)
• Communication: Lower quality may be okay,
but system robustness, real-time required.
March 24, 2006
LTI IC 2006
8
Approaches to MT: Vaquois MT Triangle
Interlingua
Give-information+personal-data (name=alon_lavie)
Generation
Analysis
Transfer
[s [np [possessive_pronoun “name”]]
[s [vp accusative_pronoun
“chiamare” proper_name]]
[vp “be” proper_name]]
Direct
Mi chiamo Alon Lavie
March 24, 2006
My name is Alon Lavie
LTI IC 2006
9
Analysis and Generation
Main Steps
• Analysis:
– Morphological analysis (word-level) and POS tagging
– Syntactic analysis and disambiguation (produce
syntactic parse-tree)
– Semantic analysis and disambiguation (produce
symbolic frames or logical form representation)
– Map to language-independent Interlingua
• Generation:
– Generate semantic representation in TL
– Sentence Planning: generate syntactic structure and
lexical selections for concepts
– Surface-form realization: generate correct forms of
words
March 24, 2006
LTI IC 2006
10
Direct Approaches
• No intermediate stage in the translation
• First MT systems developed in the
1950’s-60’s (assembly code programs)
– Morphology, bi-lingual dictionary lookup,
local reordering rules
– “Word-for-word, with some local word-order
adjustments”
• Modern Approaches: EBMT and SMT
March 24, 2006
LTI IC 2006
11
Statistical MT (SMT)
• Proposed by IBM in early 1990s: a direct, purely statistical,
model for MT
• Statistical translation models are trained on a sentencealigned parallel bilingual corpus
– Train word-level alignment models
– Extract phrase-to-phrase correspondences
– Apply them at runtime on source input and “decode”
• Attractive: completely automatic, no manual rules, much
reduced manual labor
• Main drawbacks:
– Effective only with large volumes (several mega-words) of parallel
text
– Broad domain, but domain-sensitive
– Still viable only for small number of language pairs!
• Impressive progress in last 5 years
– Large DARPA funding programs (TIDES, GALE)
– Lots of research in this direction
– GIZA++, Pharoah, CAIRO
March 24, 2006
LTI IC 2006
12
EBMT Paradigm
New Sentence (Source)
Yesterday, 200 delegates met with President Clinton.
Matches to Source Found
Yesterday, 200
delegates met behind
closed doors…
Gestern trafen sich 200
Abgeordnete hinter
verschlossenen…
Difficulties with
President Clinton…
Schwierigkeiten mit
Praesident Clinton…
Alignment (Sub-sentential)
Yesterday, 200 delegates
met behind closed
doors…
Gestern trafen sich 200
Abgeordnete hinter
verschlossenen…
Difficulties with
President Clinton over…
Schwierigkeiten mit
Praesident Clinton…
Translated Sentence (Target)
March 24, 2006
Gestern trafen sich 200 Abgeordnete mit Praesident Clinton.
Transfer Approaches
• Syntactic Transfer:
– Analyze SL input sentence to its syntactic structure
(parse tree)
– Transfer SL parse-tree to TL parse-tree (various
formalisms for specifying mappings)
– Generate TL sentence from the TL parse-tree
• Semantic Transfer:
– Analyze SL input to a language-specific semantic
representation (i.e., Case Frames, Logical Form)
– Transfer SL semantic representation to TL semantic
representation
– Generate syntactic structure and then surface
sentence in the TL
March 24, 2006
LTI IC 2006
14
Transfer Approaches
Main Advantages and Disadvantages:
• Syntactic Transfer:
– No need for semantic analysis and generation
– Syntactic structures are general, not domain specific
 Less domain dependent, can handle open domains
– Requires word translation lexicon
• Semantic Transfer:
– Requires deeper analysis and generation, symbolic
representation of concepts and predicates  difficult to
construct for open or unlimited domains
– Can better handle non-compositional meaning structures
 can be more accurate
– No word translation lexicon – generate in TL from symbolic
concepts
March 24, 2006
LTI IC 2006
15
Knowledge-based
Interlingual MT
• The classic “deep” Artificial Intelligence
approach:
– Analyze the source language into a detailed symbolic
representation of its meaning
– Generate this meaning in the target language
• “Interlingua”: one single meaning
representation for all languages
– Nice in theory, but extremely difficult in practice:
• What kind of representation?
• What is the appropriate level of detail to represent?
• How to ensure that the interlingua is in fact universal?
March 24, 2006
LTI IC 2006
16
Interlingua versus Transfer
• With interlingua, need only N parsers/
generators instead of N2 transfer systems:
L2
L2
L1
L3
L1
L3
L4
L6
L4
interlingua
L6
L5
March 24, 2006
L5
LTI IC 2006
17
Multi-Engine MT
• Apply several MT engines to
each input in parallel
• Create a combined
translation from the
individual translations
• Goal is to combine
strengths, and avoid
weaknesses.
• Along all dimensions:
domain limits, quality,
development time/cost,
run-time speed, etc.
• Various approaches to the
problem
March 24, 2006
LTI IC 2006
18
Speech-to-Speech MT
• Speech just makes MT (much) more difficult:
– Spoken language is messier
• False starts, filled pauses, repetitions, out-ofvocabulary words
• Lack of punctuation and explicit sentence boundaries
– Current Speech technology is far from perfect
• Need for speech recognition and synthesis in
foreign languages
• Robustness: MT quality degradation should be
proportional to SR quality
• Tight Integration: rather than separate
sequential tasks, can SR + MT be integrated in
ways that improves end-to-end performance?
March 24, 2006
LTI IC 2006
19
Major Sources of Translation
Problems
• Lexical Differences:
– Multiple possible translations for SL word, or
difficulties expressing SL word meaning in a single TL
word
• Structural Differences:
– Syntax of SL is different than syntax of the TL: word
order, sentence and constituent structure
• Differences in Mappings of Syntax to
Semantics:
– Meaning in TL is conveyed using a different syntactic
structure than in the SL
• Idioms and Constructions
March 24, 2006
LTI IC 2006
20
MT at the LTI
• LTI originated as the Center for Machine
Translation (CMT) in 1985
• MT continues to be a prominent sub-discipline
of research with the LTI
– More MT faculty than any of the other areas
– More MT faculty than anywhere else
• Active research on all main approaches to MT:
Interlingua, Transfer, EBMT, SMT
• Leader in the area of speech-to-speech MT
• Multi-Engine MT (MEMT)
• MT Evaluation (METEOR, BLANC)
March 24, 2006
LTI IC 2006
21
KBMT: KANT, KANTOO, CATALYST
• Deep knowledge-based framework, with symbolic
interlingua as intermediate representation
– Syntactic and semantic analysis into a unambiguous
detailed symbolic representation of meaning using
unification grammars and transformation mappers
– Generation into the target language using unification
grammars and transformation mappers
• First large-scale multi-lingual interlingua-based MT
system deployed commercially:
– CATALYST at Caterpillar: high quality translation of
documentation manuals for heavy equipment
•
•
•
•
Limited domains and controlled English input
Minor amounts of post-editing
Active follow-on projects
Contact Faculty: Eric Nyberg and Teruko Mitamura
March 24, 2006
LTI IC 2006
22
EBMT
• Developed originally for the PANGLOSS system
in the early 1990s
– Translation between English and Spanish
• Generalized EBMT under development for the
past several years
• Used in a variety of projects in recent years
– DARPA TIDES and GALE programs
– DIPLOMAT and TONGUES
• Active research work on improving alignment
and indexing, decoding from a lattice
• Contact Faculty: Ralf Brown and Jaime
Carbonell
March 24, 2006
LTI IC 2006
23
Statistical MT
• Word-to-word and phrase-to-phrase translation pairs
are acquired automatically from data and assigned
probabilities based on a statistical model
• Extracted and trained from very large amounts of
sentence-aligned parallel text
– Word alignment algorithms
– Phrase detection algorithms
– Translation model probability estimation
• Main approach pursued in CMU systems in the
DARPA/TIDES program and now in GALE
– Chinese-to-English and Arabic-to-English
• Most active work is on phrase detection and on
advanced decoding techniques
• Contact Faculty: Stephan Vogel and Alex Waibel
March 24, 2006
LTI IC 2006
24
Speech-to-Speech MT
• Evolution from JANUS/C-STAR systems to
NESPOLE!, LingWear, BABYLON, TC-STAR
– Early 1990s: first prototype system that fully
performed sp-to-sp (very limited domains)
– Interlingua-based, but with shallow task-oriented
representations:
“we have single and double rooms available”
[give-information+availability]
(room-type={single, double})
– Semantic Grammars for analysis and generation
– Multiple languages: English, German, French, Italian,
Japanese, Korean, and others
– Stat-MT applied in Speech-to-Speech scenarios
– Most active work on portable speech translation on
small devices: Arabic/English and Thai/English
– Contact Faculty: Alan Black, Stephan Vogel, Tanja
Schultz and Alex Waibel
March 24, 2006
LTI IC 2006
25
AVENUE/LETRAS:
Learning-based Transfer MT
• Develop new approaches for automatically acquiring
syntactic MT transfer rules from small amounts of
elicited translated and word-aligned data
– Specifically designed to bootstrap MT for languages for
which only limited amounts of electronic resources are
available (particularly indigenous minority languages)
– Use machine learning techniques to generalize transfer
rules from specific translated examples
– Combine with SMT-inspired decoding techniques for
producing the best translation of new input from a lattice
of translation segments
• Languages: Hebrew, Hindi, Mapudungun, Quechua
• Most active work on designing a typologically
comprehensive elicitation corpus, advanced techniques
for automatic rule learning, improved decoding, and rule
refinement via user interaction
• Contact Faculty: Alon Lavie, Lori Levin, Jaime Carbonell
and Bob Frederking
March 24, 2006
LTI IC 2006
26
Multi-Engine MT
• New approach developed over past two years under
DoD and DARPA funding (used in GALE)
• Main ideas:
– Treat original engines as “black boxes”
– Align the word and phrase correspondences between the
translations
– Build a collection of synthetic combinations based on the
aligned words and phrases
– Score the synthetic combinations based on Language
Model and confidence measures
– Select the top-scoring synthetic combination
• Architecture Issues: integrating “workflows” that
produce multiple translations and then combine them
with MEMT
– IBM’s UIMA architecture
• Contact Faculty: Alon Lavie
March 24, 2006
LTI IC 2006
27
Synthetic Combination MEMT
Two Stage Approach:
1. Align: Identify common words and phrases across
the translations provided by the engines
2. Decode: search the space of synthetic combinations
of words/phrases and select the highest scoring
combined translation
Example:
1.
2.
announced afghan authorities on saturday reconstituted
four intergovernmental committees
The Afghan authorities on Saturday the formation of the
four committees of government
March 24, 2006
LTI IC 2006
28
Synthetic Combination MEMT
Two Stage Approach:
1. Align: Identify common words and phrases across
the translations provided by the engines
2. Decode: search the space of synthetic combinations
of words/phrases and select the highest scoring
combined translation
Example:
1.
2.
announced afghan authorities on saturday reconstituted
four intergovernmental committees
The Afghan authorities on Saturday the formation of the
four committees of government
MEMT: the afghan authorities announced on Saturday the
formation of four intergovernmental committees
March 24, 2006
LTI IC 2006
29
Automatic MT Evaluation
• METEOR: new metric developed at CMU
• Improves upon BLEU metric developed by IBM and used
extensively in recent years
• Main ideas:
– Assess the similarity between a machine-produced
translation and (several) human reference translations
– Similarity is based on word-to-word matching that
matches:
• Identical words
• Morphological variants of same word (stemming)
• synonyms
– Similarity is based on weighted combination of Precision
and Recall
– Address fluency/grammaticality via a direct penalty: how
well-ordered is the matching of the MT output with the
reference?
• Improved levels of correlation with human judgments of
MT Quality
• Contact Faculty: Alon Lavie
March 24, 2006
LTI IC 2006
30
The METEOR Metric
• Example:
– Reference: “the Iraqi weapons are to be handed over to the
army within two weeks”
– MT output: “in two weeks Iraq’s weapons will give army”
• Matching: Ref: Iraqi weapons army two weeks
MT: two weeks Iraq’s weapons army
•
•
•
•
•
P = 5/8 =0.625 R = 5/14 = 0.357
Fmean = 10*P*R/(9P+R) = 0.3731
Fragmentation: 3 frags of 5 words = (3-1)/(5-1) = 0.50
Discounting factor: DF = 0.5 * (frag**3) = 0.0625
Final score: Fmean * (1- DF) = 0.3731*0.9375 = 0.3498
March 24, 2006
LTI IC 2006
31
Summary
• Main challenges for current state-of-the-art MT
approaches - Coverage and Accuracy:
– Acquiring broad-coverage high-accuracy translation
lexicons (for words and phrases)
– learning syntactic mappings between languages from
parallel word-aligned data
– overcoming syntax-to-semantics differences and
dealing with constructions
– Stronger Target Language Modeling
March 24, 2006
LTI IC 2006
32
Questions…
March 24, 2006
LTI IC 2006
33
Example
Sys1: feature prominently venezuela ranked fifth in
exporting oil field in the world and eighth in production
Sys2: Venezuela is occupied by the fifth place to export oil
in the world, eighth in production
Sys3: Venezuela the top ranked fifth in the oil export in
the world and the eighth in the production
MEMT Sentence :
Selected : venezuela is the top ranked fifth in the oil
export in the world to eighth in production.
March 24, 2006
LTI IC 2006
34
MEMT Example
IBM:
korea stands ready to allow visits to verify that it
does not manufacture nuclear weapons 0.7407
ISI: North Korea Is Prepared to Allow Washington to
Verify that It Does Not Make Nuclear Weapons
0.8007
CMU: North Korea prepared to allow Washington to the
verification of that is to manufacture nuclear
weapons 0.7668
Selected MEMT Sentence :
north korea is prepared to allow washington to
verify that it does not manufacture nuclear
weapons . 0.8894 (-2.75135)
March 24, 2006
LTI IC 2006
35
Example
Sys1: announced afghan authorities on Saturday reconstituted four
intergovernmental committees accelerate the process of disarmament
removal packing between fighters and pictures of war are still have
enjoyed substantial influence
Sys2: The Afghan authorities on Saturday the formation of the four
committees of government to speed up the process of disarmament
demobilization of fighters of the leaders of the war who still have a
significant influence.
Sys3: the authorities announced Saturday Afghan form four committees
government accelerate the process of disarmament and complete
disarmament and demobilization followed the leaders of the war who
continues to enjoy considerable influence
MEMT Sentence :
Selected : the afghan authorities on Saturday announced the formation
of the four committees of government to speed up the process of
disarmament and demobilization of fighters of the leaders of the war
who still have a significant influence.
March 24, 2006
LTI IC 2006
36
MEMT Example
IBM:
the sri lankan prime minister criticizes head of the
country's : 0.8862
ISI:
The President of the Sri Lankan Prime Minister
Criticized the President of the Country : 0.8660
CMU: Lankan Prime Minister criticizes her country: 0.6615
MEMT Sentence :
Selected: the sri lankan prime minister criticizes president
of the country . 0.9353 -3.27483
March 24, 2006
LTI IC 2006
37
Example
Sys1: victims russians are one man and his wife and abusing their eight
year old daughter plus a ( 11 and 7 years ) man and his wife and
driver , egyptian nationality . : 0.6327
Sys2: The victims were Russian man and his wife, daughter of the most
from the age of eight years in addition to the young girls ) 11 7 years
( and a man and his wife and the bus driver Egyptian nationality. :
0.7054
Sys3: the victims Cruz man who wife and daughter both critical of the
eight years old addition to two Orient ( 11 ) 7 years ) woman , wife of
bus drivers Egyptian nationality . : 0.5293
MEMT Sentence :
Selected : the victims were russian man and his wife and daughter of
the eight years from the age of a 11 and 7 years in addition to man
and his wife and bus drivers egyptian nationality . 0.7647 -3.25376
Oracle : the victims were russian man and wife and his daughter of the
eight years old from the age of a 11 and 7 years in addition to the
man and his wife and bus drivers egyptian nationality young girls .
0.7964 -3.44128
March 24, 2006
LTI IC 2006
38
Lexical Differences
• SL word has several different meanings, that
translate differently into TL
– Ex: financial bank vs. river bank
• Lexical Gaps: SL word reflects a unique
meaning that cannot be expressed by a single
word in TL
– Ex: English snub doesn’t have a corresponding verb
in French or German
• TL has finer distinctions than SL  SL word
should be translated differently in different
contexts
– Ex: English wall can be German wand (internal),
mauer (external)
March 24, 2006
LTI IC 2006
39
Lexical Differences
• Lexical gaps:
– Examples: these have no direct equivalent in
English:
gratiner
(v., French, “to cook with a cheese coating”)
ōtosanrin
(n., Japanese, “three-wheeled truck or van”)
March 24, 2006
LTI IC 2006
40
Lexical Differences
[From Hutchins & Somers]
March 24, 2006
LTI IC 2006
41
MT Handling of Lexical Differences
• Direct MT and Syntactic Transfer:
– Lexical Transfer stage uses bilingual lexicon
– SL word can have multiple translation entries,
possibly augmented with disambiguation features or
probabilities
– Lexical Transfer can involve use of limited context
(on SL side, TL side, or both)
– Lexical Gaps can partly be addressed via phrasal
lexicons
• Semantic Transfer:
– Ambiguity of SL word must be resolved during
analysis  correct symbolic representation at
semantic level
– TL Generation must select appropriate word or
structure for correctly conveying the concept in TL
March 24, 2006
LTI IC 2006
42
Structural Differences
• Syntax of SL is different than syntax of the
TL:
– Word order within constituents:
• English NPs: art adj n
• Hebrew NPs: art n art adj
– Constituent structure:
the big boy
ha yeled ha gadol
• English is SVO: Subj Verb Obj I saw the man
• Modern Arabic is VSO: Verb Subj Obj
– Different verb syntax:
• Verb complexes in English vs. in German
I can eat the apple Ich kann den apfel essen
– Case marking and free constituent order
• German and other languages that mark case:
den apfel esse Ich the(acc) apple eat I(nom)
March 24, 2006
LTI IC 2006
43
MT Handling of Structural Differences
• Direct MT Approaches:
– No explicit treatment: Phrasal Lexicons and sentence
level matches or templates
• Syntactic Transfer:
– Structural Transfer Grammars
• Trigger rule by matching against syntactic structure on
SL side
• Rule specifies how to reorder and re-structure the
syntactic constituents to reflect syntax of TL side
• Semantic Transfer:
– SL Semantic Representation abstracts away from SL
syntax to functional roles  done during analysis
– TL Generation maps semantic structures to correct
TL syntax
March 24, 2006
LTI IC 2006
44
Syntax-to-Semantics Differences
• Meaning in TL is conveyed using a different
syntactic structure than in the SL
–
–
–
–
Changes in verb and its arguments
Passive constructions
Motion verbs and state verbs
Case creation and case absorption
• Main Distinction from Structural Differences:
– Structural differences are mostly independent of
lexical choices and their semantic meaning 
addressed by transfer rules that are syntactic in
nature
– Syntax-to-semantic mapping differences are
meaning-specific: require the presence of specific
words (and meanings) in the SL
March 24, 2006
LTI IC 2006
45
Syntax-to-Semantics Differences
• Structure-change example:
I like swimming
“Ich scwhimme gern”
I swim gladly
March 24, 2006
LTI IC 2006
46
Syntax-to-Semantics Differences
• Verb-argument example:
Jones likes the film.
“Le film plait à Jones.”
(lit: “the film pleases to Jones”)
• Use of case roles can eliminate the need
for this type of transfer
– Jones = Experiencer
– film = Theme
March 24, 2006
LTI IC 2006
47
Syntax-to-Semantics Differences
• Passive Constructions
• Example: French reflexive passives:
Ces livres se lisent facilement
*”These books read themselves easily”
These books are easily read
March 24, 2006
LTI IC 2006
48
Same intention, different syntax
• rigly bitiwgacny
my leg hurts
• candy wagac fE rigly
I have pain in my leg
• rigly
bitiClimny
my leg hurts
• fE
wagac fE rigly
there is pain
in my leg
• rigly
bitinqaH calya
my leg bothers
on me
Romanization of Arabic from CallHome Egypt.
March 24, 2006
LTI IC 2006
49
MT Handling of Syntax-to-Semantics
Differences
• Direct MT Approaches:
– No Explicit treatment: Phrasal Lexicons and sentence
level matches or templates
• Syntactic Transfer:
– “Lexicalized” Structural Transfer Grammars
• Trigger rule by matching against “lexicalized” syntactic
structure on SL side: lexical and functional features
• Rule specifies how to reorder and re-structure the
syntactic constituents to reflect syntax of TL side
• Semantic Transfer:
– SL Semantic Representation abstracts away from SL
syntax to functional roles  done during analysis
– TL Generation maps semantic structures to correct
TL syntax
March 24, 2006
LTI IC 2006
50
Example of Structural Transfer Rule
(verb-argument) [From Hutchins & Somers]
March 24, 2006
LTI IC 2006
51
Semantic Transfer:
Theta Structure
(case roles)
[From Hutchins & Somers]
•Abstracts away from
grammatical functions
•Looks more like a
“semantic f-structure”
•The basis for
“semantic transfer”
March 24, 2006
LTI IC 2006
52
Idioms and Constructions
• Main Distinction: meaning of whole is
not directly compositional from meaning
of its sub-parts  no compositional
translation
• Examples:
– George is a bull in a china shop
– He kicked the bucket
– Can you please open the window?
March 24, 2006
LTI IC 2006
53
Formulaic Utterances
• Good night.
• tisbaH
cala xEr
waking up on good
• Romanization of Arabic from CallHome Egypt
March 24, 2006
LTI IC 2006
54
Constructions
• Identifying speaker intention rather
than literal meaning for formulaic and
task-oriented sentences.
How about …
suggestion
Why don’t you…
suggestion
Could you tell me…
request info.
I was wondering…
request info.
March 24, 2006
LTI IC 2006
55
MT Handling of Constructions and
Idioms
• Direct MT Approaches:
– No Explicit treatment: Phrasal Lexicons and sentence level
matches or templates
• Syntactic Transfer:
– No effective treatment
– “Highly Lexicalized” Structural Transfer rules can handle
some constructions
• Trigger rule by matching against entire construction, including
structure on SL side
• Rule specifies how to generate the correct construction on the
TL side
• Semantic Transfer:
– Analysis must capture non-compositional representation of
the idiom or construction  specialized rules
– TL Generation maps construction semantic structures to
correct TL syntax and lexical words
March 24, 2006
LTI IC 2006
56
Transfer with Strong Decoding
Word-aligned
elicited data
English
Language
Model
Learning Module
Transfer Rules
{PP,4894}
;;Score:0.0470
PP::PP [NP POSTP] -> [PREP NP]
((X2::Y1)
(X1::Y2))
Run Time
Transfer
System
Lattice
Word-to-Word
Translation
Probabilities
Decoder
Translation Lexicon
March 24, 2006
LTI IC 2006
57
MT for Minority and Indigenous
Languages: Challenges
• Minimal amount of parallel text
• Possibly competing standards for
orthography/spelling
• Often relatively few trained linguists
• Access to native informants possible
• Need to minimize development time
and cost
March 24, 2006
LTI IC 2006
58
Learning Transfer-Rules for
Languages with Limited Resources
• Rationale:
– Large bilingual corpora not available
– Bilingual native informant(s) can translate and align a
small pre-designed elicitation corpus, using elicitation tool
– Elicitation corpus designed to be typologically
comprehensive and compositional
– Transfer-rule engine and new learning approach support
acquisition of generalized transfer-rules from the data
March 24, 2006
LTI IC 2006
59
English-Hindi Example
March 24, 2006
LTI IC 2006
60
GEBMT vs. Statistical MT
Generalized-EBMT (GEBMT) uses examples at run time,
rather than training a parameterized model. Thus:
– GEBMT can work with a smaller parallel corpus than Stat
MT
– Large target language corpus still useful for generating
target language model
– Much faster to “train” (index examples) than Stat MT; until
recently was much faster at run time as well
– Generalizes in a different way than Stat MT (whether this
is better or worse depends on match between Statistical
model and reality):
• Stat MT can fail on a training sentence, while GEBMT never
will
• GEBMT generalizations based on linguistic knowledge, rather
than statistical model design
March 24, 2006
LTI IC 2006
61
MEMT chart example
Russian leaders signed
KBMT (0.8)
compact of peace
EBMT (0.65)
political leaders
EBMT (0.9)
tactful
DICT
(1.0)
pact
GLOSS
(1.0)
expedien
ts
DICT
(1.0)
political
DICT
(1.0)
Russians
DICT
(1.0)
leaders
politic
DICT
DICT
(1.0)
(1.0)
March 24, 2006
Russian
DICT
(1.0)
lideres
politicos
civilian
compact of
EBMT (0.7)
rusos
subscrib
e
DICT
(1.0)
sign
DICT
(1.0)
LTI IC
firman
GLOSS
(1.0)
civil
of peace
EBMT (1.0)
bargain
DICT
(1.0)
for
DICT
(1.0)
pact
DICT
(1.0)
of
GLOSS
(1.0)
civil peace
EBMT (0.9)
GLOSS
(1.0)
quiet
DICT
(1.0)
civilian
DICT
(1.0)
compact
DICT
(1.0)
2006
of
DICT
(1.0)
peace
DICT
(1.0)
civil
DICT
(1.0)
62
pacto
de
paz
civil
Why Machine Translation
for Minority and Indigenous Languages?
• Commercial MT economically feasible for only
a handful of major languages with large
resources (corpora, human developers)
• Is there hope for MT for languages with
limited resources?
• Benefits include:
– Better government access to indigenous
communities (Epidemics, crop failures, etc.)
– Better indigenous communities participation in
information-rich activities (health care, education,
government) without giving up their languages.
– Language preservation
– Civilian and military applications (disaster relief)
March 24, 2006
LTI IC 2006
63
English-Chinese Example
March 24, 2006
LTI IC 2006
64
Spanish-Mapudungun
Example
March 24, 2006
LTI IC 2006
65
English-Arabic Example
March 24, 2006
LTI IC 2006
66
The Elicitation Corpus
• Translated, aligned by bilingual informant
• Corpus consists of linguistically diverse
constructions
• Based on elicitation and documentation work
of field linguists (e.g. Comrie 1977, Bouquiaux
1992)
• Organized compositionally: elicit simple
structures first, then use them as building
blocks
• Goal: minimize size, maximize linguistic
coverage
March 24, 2006
LTI IC 2006
67
Transfer Rule Formalism
;SL: the old man, TL: ha-ish ha-zaqen
Type information
Part-of-speech/constituent
information
Alignments
NP::NP
(
(X1::Y1)
(X1::Y3)
(X2::Y4)
(X3::Y2)
x-side constraints
((X1 AGR) = *3-SING)
((X1 DEF = *DEF)
((X3 AGR) = *3-SING)
((X3 COUNT) = +)
y-side constraints
xy-constraints,
e.g. ((Y1 AGR) = (X1 AGR))
March 24, 2006
[DET ADJ N] -> [DET N DET ADJ]
LTI IC 2006
((Y1 DEF) = *DEF)
((Y3 DEF) = *DEF)
((Y2 AGR) = *3-SING)
((Y2 GENDER) = (Y4 GENDER))
)
68
Transfer Rule Formalism (II)
;SL: the old man, TL: ha-ish ha-zaqen
NP::NP
(
(X1::Y1)
(X1::Y3)
(X2::Y4)
(X3::Y2)
Value constraints
Agreement constraints
March 24, 2006
[DET ADJ N] -> [DET N DET ADJ]
((X1 AGR) = *3-SING)
((X1 DEF = *DEF)
((X3 AGR) = *3-SING)
((X3 COUNT) = +)
((Y1 DEF) = *DEF)
((Y3 DEF) = *DEF)
((Y2 AGR) = *3-SING)
((Y2 GENDER) = (Y4 GENDER))
)
LTI IC 2006
69
The Transfer Engine
Analysis
Transfer
Source text is parsed
A target language tree is
into its grammatical
created by reordering,
structure. Determines insertion, and deletion.
transfer application
ordering.
S
Example:
NP VP
他 看 书。(he read
book)
N
he
S
NP VP
N
V NP
他
看书
March 24, 2006
V NP
read DET N
a
book
Article “a” is inserted
into object NP. Source
words translated with
transfer lexicon.
LTI IC 2006
Generation
Target language
constraints are
checked and final
translation
produced.
E.g. “reads” is
chosen over “read”
to agree with “he”.
Final translation:
“He reads a book”
70
Rule Learning - Overview
• Goal: Acquire Syntactic Transfer Rules
• Use available knowledge from the source
side (grammatical structure)
• Three steps:
1. Flat Seed Generation: first guesses at
transfer rules; flat syntactic structure
2. Compositionality: use previously learned
rules to add hierarchical structure
3. Seeded Version Space Learning: refine
rules by learning appropriate feature
constraints
March 24, 2006
LTI IC 2006
71
Flat Seed Rule Generation
Learning Example: NP
Eng:
the big apple
Heb: ha-tapuax ha-gadol
Generated Seed Rule:
NP::NP [ART ADJ N]  [ART N ART ADJ]
((X1::Y1)
(X1::Y3)
(X2::Y4)
(X3::Y2))
March 24, 2006
LTI IC 2006
72
Flat Seed Generation
Create a transfer rule that is specific to the sentence pair,
but abstracted to the POS level. No syntactic structure.
Element
Source
SL POS sequence
f-structure
TL POS sequence
TL dictionary, aligned SL words
Type information
corpus, same on SL and TL
Alignments
informant
x-side constraints
f-structure
y-side constraints
TL dictionary, aligned SL words
(list of projecting features)
March 24, 2006
LTI IC 2006
73
Compositionality
Initial Flat Rules:
S::S
[ART ADJ N V ART N]  [ART N ART ADJ V P ART N]
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) (X4::Y5) (X5::Y7) (X6::Y8))
NP::NP [ART ADJ N]  [ART N ART ADJ]
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))
NP::NP [ART N]  [ART N]
((X1::Y1) (X2::Y2))
Generated Compositional Rule:
S::S [NP V NP]  [NP V P NP]
((X1::Y1) (X2::Y2) (X3::Y4))
March 24, 2006
LTI IC 2006
74
Compositionality - Overview
• Traverse the c-structure of the English
sentence, add compositional structure
for translatable chunks
• Adjust constituent sequences,
alignments
• Remove unnecessary constraints, i.e.
those that are contained in the lowerlevel rule
March 24, 2006
LTI IC 2006
75
Seeded Version Space Learning
Input: Rules and their Example Sets
S::S [NP V NP]  [NP V P NP]
((X1::Y1) (X2::Y2) (X3::Y4))
{ex1,ex12,ex17,ex26}
NP::NP [ART ADJ N]  [ART N ART ADJ] {ex2,ex3,ex13}
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))
NP::NP [ART N]  [ART N]
((X1::Y1) (X2::Y2))
{ex4,ex5,ex6,ex8,ex10,ex11}
Output: Rules with Feature Constraints:
S::S [NP V NP]  [NP V P NP]
((X1::Y1) (X2::Y2) (X3::Y4)
(X1 NUM = X2 NUM)
(Y1 NUM = Y2 NUM)
(X1 NUM = Y1 NUM))
March 24, 2006
LTI IC 2006
76
Seeded Version Space Learning:
Overview
• Goal: add appropriate feature constraints to the acquired rules
• Methodology:
– Preserve general structural transfer
– Learn specific feature constraints from example set
• Seed rules are grouped into clusters of similar transfer
structure (type, constituent sequences, alignments)
• Each cluster forms a version space: a partially ordered
hypothesis space with a specific and a general boundary
• The seed rules in a group form the specific boundary of a
version space
• The general boundary is the (implicit) transfer rule with the
same type, constituent sequences, and alignments, but no
feature constraints
March 24, 2006
LTI IC 2006
77
Seeded Version Space Learning:
Generalization
• The partial order of the version space:
Definition: A transfer rule tr1 is strictly more
general than another transfer rule tr2 if all fstructures that are satisfied by tr2 are also
satisfied by tr1.
• Generalize rules by merging them:
– Deletion of constraint
– Raising two value constraints to an agreement
constraint, e.g.
((x1 num) = *pl), ((x3 num) = *pl) 
((x1 num) = (x3 num))
March 24, 2006
LTI IC 2006
78
Seeded Version Space Learning
1.
2.
3.
4.
NP v det n
NP VP
…
Group seed rules into version spaces as above.
Make use of partial order of rules in version space. Partial order is defined
via the f-structures satisfying the constraints.
Generalize in the space by repeated merging of rules:
1. Deletion of constraint
2. Moving value constraints to agreement constraints, e.g.
((x1 num) = *pl), ((x3 num) = *pl) 
((x1 num) = (x3 num)
Check translation power of generalized rules against sentence pairs
March 24, 2006
LTI IC 2006
79
Seeded Version Space Learning:
The Search
• The Seeded Version Space algorithm itself is
the repeated generalization of rules by
merging
• A merge is successful if the set of sentences
that can correctly be translated with the
merged rule is a superset of the union of sets
that can be translated with the unmerged
rules, i.e. check power of rule
• Merge until no more successful merges
March 24, 2006
LTI IC 2006
80
Seeded VSL: Some Open Issues
• Three types of constraints:
– X-side constrain applicability of rule
– Y-side assist in generation
– X-Y transfer features from SL to TL
• Which of the three types improves translation
performance?
– Use rules without features to populate lattice, decoder will select
the best translation…
– Learn only X-Y constraints, based on list of universal projecting
features
• Other notions of version-spaces of feature constraints:
– Current feature learning is specific to rules that have identical
transfer components
– Important issue during transfer is to disambiguate among rules
that have same SL side but different TL side – can we learn
effective constraints for this?
March 24, 2006
LTI IC 2006
81
Examples of Learned Rules
(Hindi-to-English)
{NP,14244}
;;Score:0.0429
NP::NP [N] -> [DET N]
(
(X1::Y2)
)
{NP,14434}
;;Score:0.0040
NP::NP [ADJ CONJ ADJ N] ->
[ADJ CONJ ADJ N]
(
(X1::Y1) (X2::Y2)
(X3::Y3) (X4::Y4)
)
March 24, 2006
{PP,4894}
;;Score:0.0470
PP::PP [NP POSTP] -> [PREP NP]
(
(X2::Y1)
(X1::Y2)
)
LTI IC 2006
82
Manual Transfer Rules: Hindi Example
;; PASSIVE OF SIMPLE PAST (NO AUX) WITH LIGHT VERB
;; passive of 43 (7b)
{VP,28}
VP::VP : [V V V] -> [Aux V]
(
(X1::Y2)
((x1 form) = root)
((x2 type) =c light)
((x2 form) = part)
((x2 aspect) = perf)
((x3 lexwx) = 'jAnA')
((x3 form) = part)
((x3 aspect) = perf)
(x0 = x1)
((y1 lex) = be)
((y1 tense) = past)
((y1 agr num) = (x3 agr num))
((y1 agr pers) = (x3 agr pers))
((y2 form) = part)
)
March 24, 2006
LTI IC 2006
83
Manual Transfer Rules: Example
; NP1 ke NP2 -> NP2 of NP1
; Ex: jIvana ke eka aXyAya
;
life of (one) chapter
; ==> a chapter of life
;
{NP,12}
NP::NP : [PP NP1] -> [NP1 PP]
(
(X1::Y2)
(X2::Y1)
; ((x2 lexwx) = 'kA')
)
NP
NP
PP
NP1
NP1
NP P
Adj N
Adj N
N1 ke eka aXyAya
PP
P
one chapter of N1
N
N
jIvana
life
{NP,13}
NP::NP : [NP1] -> [NP1]
(
(X1::Y1)
)
{PP,12}
PP::PP : [NP Postp] -> [Prep NP]
(
(X1::Y2)
(X2::Y1)
)
March 24, 2006
NP
LTI IC 2006
84
A Limited Data Scenario for
Hindi-to-English
• Conducted during a DARPA “Surprise
Language Exercise” (SLE) in June 2003
• Put together a scenario with “miserly” data
resources:
– Elicited Data corpus: 17589 phrases
– Cleaned portion (top 12%) of LDC dictionary: ~2725
Hindi words (23612 translation pairs)
– Manually acquired resources during the SLE:
•
•
•
•
500 manual bigram translations
72 manually written phrase transfer rules
105 manually written postposition rules
48 manually written time expression rules
• No additional parallel text!!
March 24, 2006
LTI IC 2006
85
Manual Grammar Development
• Covers mostly NPs, PPs and VPs (verb
complexes)
• ~70 grammar rules, covering basic and
recursive NPs and PPs, verb complexes
of main tenses in Hindi (developed in
two weeks)
March 24, 2006
LTI IC 2006
86
Adding a “Strong” Decoder
• XFER system produces a full lattice of
translation fragments, ranging from single
words to long phrases or sentences
• Edges are scored using word-to-word
translation probabilities, trained from the
limited bilingual data
• Decoder uses an English LM (70m words)
• Decoder can also reorder words or phrases (up
to 4 positions ahead)
• For XFER(strong) , ONLY edges from basic XFER
system are used!
March 24, 2006
LTI IC 2006
87
Testing Conditions
• Tested on section of JHU provided data: 258
sentences with four reference translations
–
–
–
–
SMT system (stand-alone)
EBMT system (stand-alone)
XFER system (naïve decoding)
XFER system with “strong” decoder
• No grammar rules (baseline)
• Manually developed grammar rules
• Automatically learned grammar rules
– XFER+SMT with strong decoder (MEMT)
March 24, 2006
LTI IC 2006
88
Automatic MT Evaluation Metrics
• Intends to replace or complement human
assessment of translation quality of MT
produced translation
• Principle idea: compare how similar is the MT
produced translation with human
translation(s) of the same input
• Main metric in use today: IBM’s BLEU
– Count n-gram (unigrams, bigrams, trigrams, etc)
overlap between the MT output and several reference
translations
– Calculate a combined n-gram precision score
• NIST variant of BLEU used for official DARPA
evaluations
March 24, 2006
LTI IC 2006
89
Results on JHU Test Set
System
BLEU
M-BLEU
NIST
EBMT
0.058
0.165
4.22
SMT
0.093
0.191
4.64
0.055
0.177
4.46
0.109
0.224
5.29
XFER
0.116
0.231
5.37
XFER
0.135
0.243
5.59
XFER+SMT
0.136
0.243
5.65
XFER
(naïve)
man grammar
XFER
(strong)
no grammar
(strong)
learned grammar
(strong)
man grammar
March 24, 2006
LTI IC 2006
90
Effect of Reordering in the
Decoder
NIST vs. Reordering
5.7
5.6
5.5
NIST score
5.4
no grammar
learned grammar
manual grammar
MEMT: SFXER+ SMT
5.3
5.2
5.1
5
4.9
4.8
0
1
2
3
4
reordering window
March 24, 2006
LTI IC 2006
91
Observations and Lessons (I)
• XFER with strong decoder outperformed SMT
even without any grammar rules in the miserly
data scenario
– SMT Trained on elicited phrases that are very short
– SMT has insufficient data to train more discriminative
translation probabilities
– XFER takes advantage of Morphology
• Token coverage without morphology: 0.6989
• Token coverage with morphology: 0.7892
• Manual grammar currently somewhat better
than automatically learned grammar
– Learned rules did not yet use version-space learning
– Large room for improvement on learning rules
– Importance of effective well-founded scoring of
learned rules
March 24, 2006
LTI IC 2006
92
Observations and Lessons (II)
• MEMT (XFER and SMT) based on strong decoder
produced best results in the miserly scenario.
• Reordering within the decoder provided very
significant score improvements
– Much room for more sophisticated grammar rules
– Strong decoder can carry some of the reordering
“burden”
March 24, 2006
LTI IC 2006
93
XFER MT for Hebrew-to-English
• Two month intensive effort to apply our XFER
approach to the development of a Hebrew-toEnglish MT system
• Challenges:
– No large parallel corpus
– Only limited coverage translation lexicon
– Morphology: incomplete analyzer available
• Plan:
– Collect available resources, establish methodology
for processing Hebrew input
– Translate and align Elicitation Corpus
– Learn XFER rules
– Develop (small) manual XFER grammar as a point of
comparison
– Evaluate performance on unseen test data using
automatic evaluation metrics
March 24, 2006
LTI IC 2006
94
Hebrew-to-English XFER System
• First end-to-end integration of system
completed yesterday (March 2nd)
• No transfer rules yet, just word-to-word
Hebrew-to-English translation
• No strong decoding yet
• Amusing Example:
office brains the government crack H$BW& in committee
the elections the central et the possibility conduct poll
crowd about TWKNIT the NSIGH from goat
March 24, 2006
LTI IC 2006
95
Conclusions
• Transfer rules (both manual and learned) offer
significant contributions that can complement
existing data-driven approaches
– Also in medium and large data settings?
• Initial steps to development of a statistically
grounded transfer-based MT system with:
– Rules that are scored based on a well-founded
probability model
– Strong and effective decoding that incorporates the
most advanced techniques used in SMT decoding
• Working from the “opposite” end of research
on incorporating models of syntax into
“standard” SMT systems [Knight et al]
• Our direction makes sense in the limited data
scenario
March 24, 2006
LTI IC 2006
96
Future Directions
• Continued work on automatic rule learning
(especially Seeded Version Space Learning)
• Improved leveraging from manual grammar
resources, interaction with bilingual speakers
• Developing a well-founded model for assigning
scores (probabilities) to transfer rules
• Improving the strong decoder to better fit the
specific characteristics of the XFER model
• MEMT with improved
– Combination of output from different translation
engines with different scorings
– strong decoding capabilities
March 24, 2006
LTI IC 2006
97
Language Modeling for MT
• Technique stolen from Speech
Recognition
• Try to match the statistics of English
• Trigram example: “George W. …”
• Combine quality score with trigram
score, to factor in “English-like-ness”
• Problem: this gives billions of possible
overall translations
• Solution: “beam search”. At each step,
throw out all but the “best” possibilities
March 24, 2006
LTI IC 2006
98
• Speech-to-speech translation for eCommerce
– CMU, Karlsruhe, IRST, CLIPS, 2 commercial partners
• Improved limited-domain speech translation
• Experiment with multimodality and with MEMT
• EU-side has strict scheduling and deliverables
– First test domain: Italian travel agency
– Second “showcase”: international Help desk
• Tied in to CSTAR-III
March 24, 2006
LTI IC 2006
99