AVENUE Poster#2

Download Report

Transcript AVENUE Poster#2

AVENUE / LETRAS
Rule-based MT, whether transfer or interlingual, requires several
computational-linguist decades to build an MT system for a new LCTL into
English, and comparable effort to debug and refine. Moreover, it may prove
difficult or impossible to find computational linguists skilled in each LCTL of
interest. The currently-favored MT research paradigms are corpus-based MT
methods, whether statistical or example-based, but these require voluminous
quantities of professionally-translated aligned parallel text for training,
typically 1-to-10 million words or more. Such quantities of high-quality
parallel text are simply not available for most LCTLs. In contrast, the
LETRAS approach requires neither LCTL-versed computational linguists nor
large quantities of parallel text. Instead, LETRAS requires a small (10-20
thousand word) linguistically-balanced translated and aligned elicitation
corpus, a modest monolingual corpus in the LCTL, and access to a bilingual
native informant, who need not have linguistic skills.
Working with computer tools in Temuco, Chile
LETRAS Architecture (Grey areas = existing components or data)
Source
Language
Text
Morphological
Analyzer
Lexemes and
Grammatical
Features
Lattice of Partial
Translations
Transfer
Engine
Decoder
Task 4
Morphology
Rules
Task 1
Additional
sourcelanguage
Text
Target
Language
Text
; Rule to transfer Chinese question sentences
{S,3} ; Unique rule identifier
; production rules: SL and TL type and constituent or POS
sequences
S::S : [NP VP "吗"] -> [AUX NP VP]
(
; Constituent alignments
(x1::y2) ; NP to NP
(x2::y3) ; VP to VP
; Parsing (x-side) constraints, build feature structure
((x0 subj) = x1) ; Assign NP’s features to subj
((x0 subj case) = nom)
((x0 act) = quest)
(x0 = x2)
; Transfer (xy) constraints
((y2 case) = (x0 subj case))
; Generation (y-side) constraints
; Insert AUX on target side based on
; value constraints
((y1 form) = do)
; Enforce value and agreement restrictions on y-side
((y3 vform) =c inf) ; verb must be infinitive
((y1 agr) = (y2 agr))
)
Transfer Rule Format
In order to present the learning algorithms, we must first explain
the learning objective, i.e. the transfer rules. The rules in the
Avenue system follow a specific formalism. The below transfer
rule between Chinese and English illustrates this formalism:
Task 4
Transfer Rules
Task 2
Morphology
Rule Induction
Transfer
Rule
Induction
Rule
Refinement
Module
Translation
Correction
Tool
a.as.o.os.tro
1
cas
Informant
Task 3
Word-aligned
Parallel Text
Feature
Detection
Readable
Grammar
a.as.o.os
43
african
cas
jurídic
l
...
Algorithm
Data
Informant
Elicitation
Tool
a.as.o
59
cas
citad
jurídic
l
...
MILE Architecture (provides data feeds for LETRAS)
Elicitation
Corpus
Navigation
a.tro
2
cas
cen
me.mes.med
bla
e.es.ed
blam
Ø.s.d
blame
e.es
blam
solv
me.mes
bla
me.med
bla
Ø.d
blame
e.ed
blam
mes.med
bla
e
blam
solv
es
blam
solv
mes
bla
ed
blam
roam
a
1237
huelg
ib
id
iglesi
...
s
blame
roam
solve
d
blame
roame
Portion of a CIC lattice consisting of the word forms: blame, blames,
blamed, roams, roamed, roaming, solve, solves, solving.
a.os
134
impedid
impuest
indonesi
inundad
...
as
404
huelg
huelguist
incluid
industri
...
a.o.os
105
impuest
indonesi
italian
jurídic
...
as.os
68
cas
implicad
inundad
jurídic
...
o
1139
hub
hug
human
huyend
...
as.o.os
54
cas
implicad
jurídic
l
...
o.os
268
human
implicad
indici
indocumentad
...
os
534
humorístic
human
hígad
impedid
...
Hierarchical CIC lattice derived from Spanish. Each CIC
box contains the c-suffixes comprising the CIC, the c-stem
count of the CIC, and a sample of the CIC’s c-stems.
Ø
blame
blames
blame
roams
roamed
roaming
solve
solves
solving
Hierarchical c-suffix set inclusion links
Morpheme boundary links
Example of initial screen with incorrect translation (left), and
the same example screen with sentence in the process of
being corrected (right) with the TCTool.
tro
16
catas
ce
cen
cua
...
a.o
as.o
214
85
id
intern
indi
jurídic
indonesi
just
inmediat
l
...
...
s.d
blame
es.ed
blam
me
bla
med
bla
roa
Ø.s
blame
solve
a.as
199
huelg
incluid
industri
inundad
...
a.as.os
50
afectad
cas
jurídic
l
...
The left side shows an
example of compositionality.
The right side shows a
successful application of the
Seeded Version Space
algorithm.