Learning Syntactic Transfer Rules

Download Report

Transcript Learning Syntactic Transfer Rules

Machine Translation
Challenges and Language Divergences
Alon Lavie
Language Technologies Institute
Carnegie Mellon University
11-731: Machine Translation
January 12, 2011
Major Sources of Translation
Problems
• Lexical Differences:
– Multiple possible translations for SL word, or
difficulties expressing SL word meaning in a single TL
word
• Structural Differences:
– Syntax of SL is different than syntax of the TL: word
order, sentence and constituent structure
• Differences in Mappings of Syntax to
Semantics:
– Meaning in TL is conveyed using a different syntactic
structure than in the SL
• Idioms and Constructions
January 12, 2011
11-731: Machine Translation
2
Lexical Differences
• SL word has several different meanings, that
translate differently into TL
– Ex: financial bank vs. river bank
• Lexical Gaps: SL word reflects a unique
meaning that cannot be expressed by a single
word in TL
– Ex: English snub doesn’t have a corresponding verb
in French or German
• TL has finer distinctions than SL  SL word
should be translated differently in different
contexts
– Ex: English wall can be German wand (internal),
mauer (external)
January 12, 2011
11-731: Machine Translation
3
Google at Work…
January 12, 2011
11-731: Machine Translation
4
January 12, 2011
11-731: Machine Translation
5
January 12, 2011
11-731: Machine Translation
6
Lexical Differences
• Lexical gaps:
– Examples: these have no direct equivalent in
English:
gratiner
(v., French, “to cook with a cheese coating”)
ōtosanrin
(n., Japanese, “three-wheeled truck or van”)
January 12, 2011
11-731: Machine Translation
7
Lexical Differences
[From Hutchins & Somers]
January 12, 2011
11-731: Machine Translation
8
MT Handling of Lexical Differences
• Direct MT and Syntactic Transfer:
– Lexical Transfer stage uses bilingual lexicon
– SL word can have multiple translation entries,
possibly augmented with disambiguation features or
probabilities
– Lexical Transfer can involve use of limited context
(on SL side, TL side, or both)
– Lexical Gaps can partly be addressed via phrasal
lexicons
• Semantic Transfer:
– Ambiguity of SL word must be resolved during
analysis  correct symbolic representation at
semantic level
– TL Generation must select appropriate word or
structure for correctly conveying the concept in TL
January 12, 2011
11-731: Machine Translation
9
Structural Differences
• Syntax of SL is different than syntax of the
TL:
– Word order within constituents:
• English NPs: art adj n
• Hebrew NPs: art n art adj
– Constituent structure:
the big boy
ha yeled ha gadol
• English is SVO: Subj Verb Obj I saw the man
• Modern Arabic is VSO: Verb Subj Obj
– Different verb syntax:
• Verb complexes in English vs. in German
I can eat the apple Ich kann den apfel essen
– Case marking and free constituent order
• German and other languages that mark case:
den apfel esse Ich the(acc) apple eat I(nom)
January 12, 2011
11-731: Machine Translation
10
January 12, 2011
11-731: Machine Translation
11
January 12, 2011
11-731: Machine Translation
12
January 12, 2011
11-731: Machine Translation
13
MT Handling of Structural Differences
• Direct MT Approaches:
– No explicit treatment: Phrasal Lexicons and sentence
level matches or templates
• Syntactic Transfer:
– Structural Transfer Grammars
• Trigger rule by matching against syntactic structure on
SL side
• Rule specifies how to reorder and re-structure the
syntactic constituents to reflect syntax of TL side
• Semantic Transfer:
– SL Semantic Representation abstracts away from SL
syntax to functional roles  done during analysis
– TL Generation maps semantic structures to correct
TL syntax
January 12, 2011
11-731: Machine Translation
14
Syntax-to-Semantics Differences
• Meaning in TL is conveyed using a different
syntactic structure than in the SL
–
–
–
–
Changes in verb and its arguments
Passive constructions
Motion verbs and state verbs
Case creation and case absorption
• Main Distinction from Structural Differences:
– Structural differences are mostly independent of
lexical choices and their semantic meaning 
addressed by transfer rules that are syntactic in
nature
– Syntax-to-semantic mapping differences are
meaning-specific: require the presence of specific
words (and meanings) in the SL
January 12, 2011
11-731: Machine Translation
15
Syntax-to-Semantics Differences
• Structure-change example:
I like swimming
“Ich scwhimme gern”
I swim gladly
January 12, 2011
11-731: Machine Translation
16
January 12, 2011
11-731: Machine Translation
17
Syntax-to-Semantics Differences
• Verb-argument example:
Jones likes the film.
“Le film plait à Jones.”
(lit: “the film pleases to Jones”)
• Use of case roles can eliminate the need
for this type of transfer
– Jones = Experiencer
– film = Theme
January 12, 2011
11-731: Machine Translation
18
January 12, 2011
11-731: Machine Translation
19
January 12, 2011
11-731: Machine Translation
20
Syntax-to-Semantics Differences
• Passive Constructions
• Example: French reflexive passives:
Ces livres se lisent facilement
*”These books read themselves easily”
These books are easily read
January 12, 2011
11-731: Machine Translation
21
January 12, 2011
11-731: Machine Translation
22
January 12, 2011
11-731: Machine Translation
23
Same intention, different syntax
• rigly bitiwgacny
my leg hurts
• candy wagac fE rigly
I have pain in my leg
• rigly
bitiClimny
my leg hurts
• fE
wagac fE rigly
there is pain
in my leg
• rigly
bitinqaH calya
my leg bothers
on me
Romanization of Arabic from CallHome Egypt.
January 12, 2011
11-731: Machine Translation
24
MT Handling of Syntax-to-Semantics
Differences
• Direct MT Approaches:
– No Explicit treatment: Phrasal Lexicons and sentence
level matches or templates
• Syntactic Transfer:
– “Lexicalized” Structural Transfer Grammars
• Trigger rule by matching against “lexicalized” syntactic
structure on SL side: lexical and functional features
• Rule specifies how to reorder and re-structure the
syntactic constituents to reflect syntax of TL side
• Semantic Transfer:
– SL Semantic Representation abstracts away from SL
syntax to functional roles  done during analysis
– TL Generation maps semantic structures to correct
TL syntax
January 12, 2011
11-731: Machine Translation
25
Idioms and Constructions
• Main Distinction: meaning of whole is
not directly compositional from meaning
of its sub-parts  no compositional
translation
• Examples:
– George is a bull in a china shop
– He kicked the bucket
– Can you please open the window?
January 12, 2011
11-731: Machine Translation
26
January 12, 2011
11-731: Machine Translation
27
Formulaic Utterances
• Good night.
• tisbaH
cala xEr
waking up on good
• Romanization of Arabic from CallHome Egypt
January 12, 2011
11-731: Machine Translation
28
Constructions
• Identifying speaker intention rather
than literal meaning for formulaic and
task-oriented sentences.
How about …
suggestion
Why don’t you…
suggestion
Could you tell me…
request info.
I was wondering…
request info.
January 12, 2011
11-731: Machine Translation
29
January 12, 2011
11-731: Machine Translation
30
MT Handling of Constructions and
Idioms
• Direct MT Approaches:
– No Explicit treatment: Phrasal Lexicons and sentence level
matches or templates
• Syntactic Transfer:
– No effective treatment
– “Highly Lexicalized” Structural Transfer rules can handle
some constructions
• Trigger rule by matching against entire construction, including
structure on SL side
• Rule specifies how to generate the correct construction on the
TL side
• Semantic Transfer:
– Analysis must capture non-compositional representation of
the idiom or construction  specialized rules
– TL Generation maps construction semantic structures to
correct TL syntax and lexical words
January 12, 2011
11-731: Machine Translation
31
Take Home Messages
• Remember these types of language divergences as you
learn about and apply the various steps in the MT
system pipelines of different approaches!
• Ask yourself how capable these various steps and
approaches are in addressing these types of
divergences!
– Can the step/approach handle these divergences?
– If so, does it model the divergence at the appropriate level
of abstraction?
• Keep these language divergences in mind when you
analyze the errors of the MT system that you have put
together and trained!
– Are the errors attributable to a particular divergence?
– What would be required for the system to address this
type of error?
January 12, 2011
11-731: Machine Translation
32
Summary
• Main challenges for current state-of-the-art MT
approaches - Coverage and Accuracy:
– Acquiring broad-coverage high-accuracy translation
lexicons (for words and phrases)
– learning syntactic mappings between languages from
parallel word-aligned data
– overcoming syntax-to-semantics differences and
dealing with constructions
– Effective Target Language Modeling
January 12, 2011
11-731: Machine Translation
33
Homework Assignment #1
January 12, 2011
11-731: Machine Translation
34
Questions…
January 12, 2011
11-731: Machine Translation
35