Transcript Document

Machine Translation
Introduction
Jan Odijk
LOT Winterschool
Amsterdam January 2011
1
Overview
•
•
•
•
•
•
•
MT: What is it
MT: What is not possible (yet?)
MT: Why is it so difficult?
MT: Can we make it possible?
MT: Evaluation
MT: What is (perhaps) possible
Conclusions
2
MT: What is it?
• Input: text in source language
• Output text in target language that is a
translation of the input text
3
MT: What is it?
Interlingua
Analyzed input  transfer Analyzed output
Input
direct translation
Output
4
MT: System Types
• Direct:
– Earliest systems (1950s)
• Direct word-to-word translation
– Recent statistical MT systems
• Transfer
– Almost all research and commercial systems <=
1990
• Interlingual
5
MT: System Types
• Interlingual
– A few research systems in the 1980s
• Rosetta (Philips), based on Montague Grammar
– Semantic derivation trees of attuned grammars
• Distributed Translation (BSO)
– (enriched) Esperanto
• Sometimes logical representations
• Hybrid Interlingual/Transfer
– Transfer for lexicons; IL for rules
6
Rule-Based Systems
• Most systems
– explicit source language grammar
– parser yields analysis of source language input
– transfer component turns it into target language
structure
– no explicit grammar of target language (except
morphology)
7
Rule-Based Systems
• Some systems (Eurotra)
– explicit source and target language grammar
• sometimes reversible
– parser yields analysis of source language input
– transfer component turns it into target language
structure
– generation of translation by target language
grammar
8
Rule-Based Systems
• Some systems (Rosetta, DLT)
– explicit source and target language grammar
• in some cases reversible
– parser yields interlingual representation
– generation of translation by target language
grammar from interlingual representation
9
MT: Is it difficult?
• FAHQT: Fully Automatic High Quality
Translation
– Fully Automatic: no human intervention
– High Quality: close or equal to human
translation
• Even acceptable quality is difficult to
achieve
10
MT: Why is it so difficult?
• Ambiguity
– Real
– Temporary
•
•
•
•
•
Computational Complexity
Complexity of language
Divergences
Language Competence v. Language Use
Require large and rich lexicons
11
MT: Why is it so difficult?
• De jongen sloeg het meisje met de gitaar
• Hij heeft boeken gelezen
• Hij heeft uren gelezen
–
–
–
–
He has been reading books
*He has been reading for books
*He has been reading hours
He has been reading for hours
12
MT: Why is it so difficult?
• Uren: not only also
– dagen, de hele dag, weken, …
– (Words expressing units of time)
• But also:
– De hele vergadering, meeting, bijeenkomst, les,
…
– (words expressing events)
13
MT: Why is it so difficult?
• Hij draagt een bruin pak
– Dragen: wear or carry
– Pak: suit or package
• Hij draagt een bruin pak en zwarte
schoenen
• Hij draagt een bruin pak onder zijn arm
14
MT: Why is it so difficult?
• Voert uw bedrijf sloten uit?
– Uitvoeren: execute, or export?
– Bedrijf: act, or company?
– Sloten: ditches, or locks?
15
MT: Why is it so difficult?
• Temporary Ambiguity
– Hij heeft boeken gelezen
• Heeft: main or auxiliary verb?
• Boeken: noun or verb
– Voert uw bedrijf sloten uit?
• Voert: form of voeren or of uitvoeren,
• Bedrijf: noun or verb form?
• Sloten uit: noun+particle or PP: out of ditches/locks
16
Why is MT difficult?
• Ambiguity of natural language Summary
– requires modeling of knowledge of the world
/situation
• by rule systems, and/or
• by statistics
17
MT: Why is it so difficult?
• Computational Complexity
– High demands of processing capacity
– High demands on memory
• Complexity of language
– Many different construction types
– All interacting with each other
18
Why is MT difficult?
• Divergences between language
– require deep syntactic analysis
– Or very sophisticated statistical techniques
19
Divergences:
Category mismatches
• Simple category mismatches
–
–
–
–
–
–
woonachtig (zijn) v. reside (Adj – Verb)
zich ergeren v. (be) annoyed (Verb-Adj)
verliefd v. in love (Adj- Prep+Noun)
kunnen v. (be) able
kunnen v. (be) possible
door- v. continue (to)
20
Divergences:
Category mismatches
• More complex category mismatches
– graag vs. like (Adv vs. Verb)
• hij zwemt graag vs. he likes to swim
– toevallig vs. happen
• hij viel toevallig vs. he happened to fall
21
Divergences:
Category mismatches
• Phrasal category mismatches
– de zieke vrouw
– the woman who is ill (* the ill woman)
– I expect her to leave
• ik verwacht dat zij vertrekt
– She is likely to come
• het is waarschijnlijk dat zij komt
22
Conflational Divergences:
• prepositional complements
– houden van vs. love
• existential er vs. Ø
– er passeerde een auto vs.
– a car passed
• verbal particles
– blow (something) up vs. volar
23
Conflational Divergences:
• reflexive verbs
– zich scheren vs. shave
• composed vs. simple tense forms
– he will do it vs. lo hará
• split negatives vs. composed negatives
– he does not see anyone vs.
– hij ziet niemand
24
Functional Divergences:
• I like these apples
– me gustan estas manzanas
• se venden manzanas aqui
– hier verkoopt men appels
• er werd door de toeschouwers gejuicht
– the spectators were cheering
25
Divergences: MWEs
• semi-fixed MWEs
– nuclear power plant vs. kerncentrale
• flexible idioms
– de plaat poetsen vs. bolt
– de pijp uit gaan v. to kick the bucket
26
Divergences: MWEs
• semi-idioms (collocations)
– zware shag vs. strong tobacco
• semi-idioms (support verbs)
– aandacht besteden aan
– pay attention to
27
MT: Why is it so difficult?
• Language Competence v. Language Use
– Earlier systems implemented idealized reality
– But not the really occurring language use
– In some cases
• focus on theoretically interesting difficult
constructions
• That do occur in reality
• But other constructions are more important to deal
with in practical systems
28
MT: Why is it so difficult?
• Large and rich lexicons
– Existing human-oriented dictionaries are not
suited as such
– All information must be available in a
formalized way
– Much more information is needed than in a
traditional dictionary
29
MT: Why is it so difficult?
• Multi-word Expressions (MWEs)
– Are in current dictionaries only in a very
informal way
– No standards on how to represent them
lexically
– Many different types requiring different
treatment in the grammar
– Huge numbers!!
– Domain and company-specific terminology are
often MWEs
30
MT: Can we make it possible?
• Probably not,
• but we can still improve significantly
– Lexicons
– Selection restrictions
– Approximating analyses
• Statistical MT
31
MT: Can we make it possible?
• Large and rich lexicons
– widely accepted and used (de facto) standards
– Methods and tools to quickly adapt to domain
or company specific vocabulary
– Better treatment of MWEs and standards for
lexical representation of MWEs
32
MT: Can we make it possible?
• Selection restrictions with type system to
approach modeling of world knowledge
– Requires sophisticated syntactic analysis
•
•
•
•
•
Boek: info (legible)
Uur: time unit  duration
Vergadering: event  duration
Lezen: subject=human; object=info (legible)
Durational adjunct must be a duration phrase
33
MT: Can we make it possible?
• Selection restrictions
–
–
–
–
–
–
–
–
Pak (1) (suit): cloths
Pak (2) (package): entity
Dragen (1) (wear): subj=animate; object=cloths
Dragen (2) (carry): subj=animate; object= entity
Schoen: cloths
Entity > cloths
Identity preferred over subsumption
Homogeneous object preferred over heterogeneous one
34
MT: Can we make it possible?
• Selection restrictions
– Hij draagt een bruin pak
•
•
•
•
He wears a brown suit (1: cloths=cloths)
He carries a brown package (1: entity=entity)
He carries a brown suit (2: entity > cloth)
*He wears a brown package (cloth ¬> entity)
– Hij draagt een bruin pak en zwarte schoenen
• He wears a brown suit and black shoes (1: homogeneous and
cloths=cloths)
• He carries a brown suit and black shoes (2: homogeneous but
entity > cloths)
• He carries a brown package and black shoes(2:
inhomogeneous but entity=entity)
• *He wears a brown package and black shoes (cloths ¬> entity)
35
MT: Can we make it possible?
• Approximating analyses
– Ignore certain ambiguities to begin with
– Use only limited amount of relevant
information
– Cut off analysis when there are too many
alternatives
– This is currently actually done in all practical
systems
– Need new ways of doing this without affecting
quality too seriously
36
MT: Can we make it possible?
• Statistical MT
• Derives MT-system automatically
– From statistics taken from
• Aligned parallel corpora ( translation model)
• Monolingual target language corpora ( language
model)
• Being worked since early 90’s
37
MT: Can we make it possible?
• Plus:
– No or very limited grammar development
– Includes language and world knowledge automatically
(but implicitly)
– Based on actually occurring data
– Currently many experimental and commercial systems
• Minus:
– Requires large aligned parallel corpora
– Unclear how much linguistics will be needed anyway
– Probably restricted to very limited domains only
38
MT: Can we make it possible?
•
•
•
•
Google Translate (statistical MT)
Hij draagt een pak.  √He wears a suit.
Hij draagt schoenen.  √ He wears shoes.
Hij draagt bruine schoenen en een pak.
•  √ He wears a suit and brown shoes. (!!)
• Hij draagt het pakket  √ He carries the package
• Hij heeft een pak aan.  *He has a suit.
• Voert uw bedrijf sloten uit?
•
–  *Does your company locks out?
39
MT: Can we make it possible?
• Euromatrix esp. “the Euromatrix”
– Lists data and tools for European language pairs
– Goals
• Translation systems for all pairs of EU languages
• Organization, analysis and interpretation of a competitive annual international
evaluation of machine translation
• The provision of open source machine translation technology including
research tools, software and data
• A systematically compiled and constantly updated detailed survey of the state
of MT technology for all EU language pairs
• Efficient inclusion of linguistic knowledge into statistical machine
translation
• The development and testing of hybrid architectures for the integration of
rule-based and statistical approaches
40
MT: Can we make it possible?
• Euromatrix esp. “the Euromatrix”
– Lists data and tools for European language pairs
– Goals
• Translation systems for all pairs of EU languages
• Organization, analysis and interpretation of a competitive annual international
evaluation of machine translation
• The provision of open source machine translation technology including
research tools, software and data
• A systematically compiled and constantly updated detailed survey of the state
of MT technology for all EU language pairs
• Efficient inclusion of linguistic knowledge into statistical machine translation
• The development and testing of hybrid architectures for the integration of
rule-based and statistical approaches
• Successor project EuromatrixPlus
41
MT: Can we make it possible?
• META-NET 2010-2013 (EU-funding)
– Building a community with shared vision and strategic
research agenda
– Building META-SHARE, an open resource exchange
facility
– Building bridges to neighbouring technology fields
•
•
•
•
Bringing more Semantics into Translation
Optimising the Division of Labour in Hybrid MT
Exploiting the Context for Translation
Empirical Base for Machine Translation
42
MT: Can we make it possible?
• PACO-MT 2008-2011
• Investigates hybrid approach to MT
– Rule-based and statistical
– Uses existing parser for source language
analysis
– Uses statistical n-gram language models for
generation
– Uses statistical approach to transfer
43
MT Evaluation
• Evaluation depends on purpose of MT and how it
is used
– application, domain, controlled language
• Many aspects can be evaluated
– functionality, efficiency, usability, reliability,
maintainability, portability
– translation quality
– embedding in work flow
• post-editing options/tools
44
MT Evaluation
• Focus here:
– does the system yield good translations
according to human judgement
– in the context of developing a system
• Again, many aspects:
– fidelity (how close), correctness, adequacy,
informativeness, intelligibility, fluency
– and many ways to measure these aspects
45
MT Evaluation
•
Test suite
– Reference =
• list of (carefully selected) sentences
• with their translations (ordered by score)
– translations judged correct by human (usually developer)
– upon every update of the system output of the new system is compared to the
reference
• if different: system has to be adapted, or reference has to be adapted
•
Advantages
– focus on specific translation problems possible
– excellent for regression testing
– Manual judgement needed only once for each new output
• –other comparisons are automatic
•
Disadvantages
– not really independent
– particularly suited for pure rule-based systems
– human judgement needed if output differs from reference
46
MT Evaluation
• Comparison against
– translation corpus
– independently created by human translators
– possibly multiple equivalently correct translations of a sentence
• Advantages
– truely independent
– also suited for data-driven systems
• Disadvantage
– requires human judgement (every time there is a system update)
• high effort by highly skilled people, high costs, requires a lot of time
– human judgement is not easy (unless there is a perfect match)
• Useful
– for a one-time evaluation of a stable system
– not for evaluation during development
47
MT Evaluation
• Edit-Distance (Word Accuracy)
– metric to determine closeness of translations
automatically
– the least number of edit operations to turn the
translated sentence into the reference sentence
– Alshawi et al. 1998
48
MT Evaluation
•
•
•
•
•
•
•
WA = 1- ((d+s+i)/max(r,c))
d= number of deletions
s = number of substitutions
i = number of insertions
r = reference sentence length
c = candidate sentence length
easy to calculate using Levenshtein distance
algorithm (dynamic programming)
• various extensions have been proposed
49
MT Evaluation
• Advantages
– fully automatic given a reference set
• Disadvantages
– penalizes candidates if a synonym is used
– penalizes swaps of words and block of words
too much
50
MT Evaluation
• BLEU (method to automate MT Evaluation)
– the closer a machine translation is to a
professional human translation, the better it is
– BiLingual Evaluation Understudy
• Required:
– corpus of good quality human reference
translations
– a “closeness” metric
51
MT Evaluation
• Two candidate translations from Chinese
source
– C1: It is a guide to action which ensures that the
military always obeys the commands of the
party
– C2: It is to insure the troops forever hearing the
activity guidebook that party direct
• Intuitively: C1 is better than C2
52
MT Evaluation
• Three reference translations
– R1: It is a guide to action that ensures that the
military will forever heed Party commands
– R2: It is the guiding principle which guarantees
the military forces always being under the
command of the Party
– R3: It is the practical guide for the army always
to heed the directions of the party
53
MT Evaluation
• Basic idea:
– a good candidate translation shares many words
and phrases with reference translations
– comparing n-gram matches can be used to
rank candidate translations
• n-gram: a sequence of n word occurrences
– in BLEU n=1,2,3,4
- 1-grams give a measure of adequacy
- longer n-grams give a measure of fluency
54
MT Evaluation
• For unigrams:
– count the number of matching unigrams
• in all references
– divide by the total number of unigrams (in the
candidate sentence)
55
MT Evaluation
• Problem
– C1: the the the the the the the (=7/7=1)
– R1: the cat is on the mat
• Solution:
– clip matching count (7) by maximum reference
count (2)  2 (CountClip)
–  modified unigram precision = 2/7=0.29
56
MT Evaluation
• Example (unigrams)
– C1: It is a guide to action which ensures that the
military always obeys the commands of the party
(17/18=0.94)
– R1: It is a guide to action that ensures that the military
will forever heed Party commands
– R2: It is the guiding principle which guarantees the
military forces always being under the command of the
Party
– R3: It is the practical guide for the army always to heed
the directions of the party
57
MT Evaluation
• Example (unigrams)
– C2: It is to insure the troops forever hearing the activity
guidebook that party direct (8/14=0.57)
– R1: It is a guide to action that ensures that the military
will forever heed Party commands
– R2: It is the guiding principle which guarantees the
military forces always being under the command of the
Party
– R3: It is the practical guide for the army always to heed
the directions of the party
58
MT Evaluation
• Example (bigrams)
– C1: It is a guide to action which ensures that the
military always obeys the commands of the party
(10/17=0.59)
– R1: It is a guide to action that ensures that the military
will forever heed Party commands
– R2: It is the guiding principle which guarantees the
military forces always being under the command of the
Party
– R3: It is the practical guide for the army always to heed
the directions of the party
59
MT Evaluation
• Example (bigrams)
– C2: It is to insure the troops forever hearing the activity
guidebook that party direct (1/13=0.08)
– R1: It is a guide to action that ensures that the military
will forever heed Party commands
– R2: It is the guiding principle which guarantees the
military forces always being under the command of the
Party
– R3: It is the practical guide for the army always to heed
the directions of the party
60
MT Evaluation
•
•
•
•
•
Extend to a full multi-sentence corpus
compute n-gram matches sentence by sentence
sum the clipped n-gram counts for all candidates
divide by the number of n-grams in the text corpus
pn =
– ∑C ∈ {Candidates}∑n-gram ∈ C Countclip(n-gram)
– divided by
– ∑C’ ∈ {Candidates}∑n-gram’ ∈ C’ Count(n-gram’)
61
MT Evaluation
• Combining n-gram precision scores
• weighted linear average works reasonable
– ∑Nn=1 wn pn
• but: n-gram decisions decays exponentially
with n (so log to compensate for this)
– exp (∑Nn=1 wn log pn)
• weights in BLEU: wn = 1/N
62
MT Evaluation
• BLEU is a precision measure
– #(C ∩ R) / #C
• Recall is difficult to define because of
multiple reference translations
– e.g. #(C ∩ Rs) / # Rs
• where Rs = Ui Ri
– will not work
63
MT Evaluation
•
•
•
•
•
•
•
C1: I always invariably perpetually do
C2: I always do
R1: I always do
R2: I invariably do
R3: I perpetually do
Recall of C1 over R1-3 is better than C2
but C2 is a better translation
64
MT Evaluation
• But without Recall:
–
–
–
–
–
C1: of the
compared with R1-3 as before
modified unigram precision = 2/2
modified bigram precision = 1/1
which is the wrong result
65
MT Evaluation
• Length
– n-gram precision penalizes translations longer
than the reference
– but not translations shorter than the reference
–  Add Brevity Penalty (BP)
66
MT Evaluation
• bi= best match length = reference sentence
length closest to candidate sentence i‘s
length (e.g. r:12, 15, 17, c: 12  12)
• r = test corpus effective reference length =
∑i bi
• c = total length of candidate translation
corpus
67
MT Evaluation
• BP =
–
–
–
–
computed over the corpus
not sentence by sentence and averaged
1 if c > r
e(1-r/c) if c <= r
• BLEU = BP • exp (∑Nn=1 wn log pn)
68
MT Evaluation
• BLEU:
– claim: BLEU closely matches human judgement
• when averaged over a test corpus
• not necessarily on individual sentences
• shown extensively in Papineni et al. 2001
–  multiple reference translations are desirable
• to cancel out translation styles of individual translators
• (e.g. East Asian economy v. economy of East Asia)
69
MT Evaluation
• Variants on BLEU
– NIST
• http://www.nist.gov/speech/tests/mt/doc/ngramstudy.pdf
• different weights
• different BP
– ROUGE (Lin and Hovy 2003)
• for text summarization
• Recall-Oriented Understudy for Gisting Evaluation
70
MT Evaluation
• Main Advantage of BLEU
– automatic evaluation
• good for use during development
• particularly useful for data-based systems
• Disadvantage
– defined for a whole test corpus
– not for individual sentences
– just measures difference with reference
71
MT: What is (perhaps) possible
•
•
•
•
•
•
•
•
Cross-Language Information Retrieval
Low Quality MT for Gist extraction
MT and Speech Technology
Controlled Language
Limited Domain
Interaction with author
Combinations of the above
Computer-aided translation
72
MT: What is (perhaps) possible
• Cross-Language Information Retrieval
(CLIR)
–
–
–
–
Input query: in own language
Input query translated into target languages
Search in target language documents
Results in target language
• Translation of individual words only
• Growing need (growing multilingual Web)
• No perfect translation required
73
MT: What is (perhaps) possible
74
MT: What is (perhaps) possible
• Low quality MT for Gist extraction
• Low quality but still useful
• If interesting high quality human translation
can be requested (has to be paid for)
75
MT: What is (perhaps) possible
76
MT: What is (perhaps) possible
77
MT: What is (perhaps) possible
• CLIR
– Fills a growing need in the market
– Is technically feasible
– Creates need for translation of found
documents
• Solved partially by low quality MT
• Potentially creates need for more human translation
• Stimulates (funds) research into more sophisticated
MT
78
MT: What is (perhaps) possible
• Combine MT (statistical or rule-based) with OCR
technology
–
–
–
–
Make a picture of a text with your phone
Text is OCR-ed
Text is translated
(usually a short and simple text)
• Linguatec Shoot & Translate
• Word Lens
79
MT: What is (perhaps) possible
• Combine MT (statistical or rule-based) with
Speech technology
– Complicates the problem on the one hand but
– Speech technology (ASR) is currently limited to very
limited domains (makes MT simpler)
– Many useful applications for speech technology
currently in the market
• Directory assistance
Tourist Information
• Tourist communication Call Centers
• Navigation
Hotel reservations
– Some will profit from in-built automatic translation
80
MT: What is (perhaps) possible
• Large EC FP6 project TC-STAR (2004-)
– (http://www.tc-star.org/)
– Research into improved speech technology
(ASR and TTS)
– Research into statistical MT
– Research in combining both (speech-to-speech
translation)
– In a few selected limited domains
81
MT: What is (perhaps) possible
• Commercial Speech2Speech Translation
• Jibbigo
– http://www.jibbigo.com
• Speech-to-speech translation (iPhone, Android)
• http://www.phonedog.com/2009/10/30/iphone-appjibbigo-speech-translator
• Talk to Me (Android phones)
82
MT: What is (perhaps) possible
• Controlled Language
– Authoring System limits vocabulary and syntax
of document authors
– Often desirable in companies to get consistent
documentation (e.g. aircraft maintenance
manuals)
• AECMA Simplified English
• GIFAS Rationalized French
– Makes MT easier (language well-defined)
83
MT: What is (perhaps) possible
• Limited Domain
– Translation of
• Weather reports (TAUM-Meteo, Canada)
• Avalanche warnings (Switzerland)
– Fast adaptation to domain/company-specific
vocabulary and terminology
84
MT: What is (perhaps) possible
• Interaction with author
– No fully automatic translation
– Document author resolves
• Ambiguities unresolved by the system
• In a dialogue between the author and the system in
the source language
• Approach taken in Rosetta project (Philips)
• Will only work if the
– #unresolved ambiguities is low
– Questions to resolve ambiguity are clear
85
MT: What is (perhaps) possible
• Hij droeg een bruin pak
– Wat bedoelt u met “pak”
• (1) kostuum
• (2) pakket
• Hij droeg een bruin pak
– Wat bedoelt u met “dragen (droeg)”
• (1) aan of op hebben (kleding)
• (2) bij zich hebben (bijv. in de hand)
86
MT: What is (perhaps) possible
• Combinations of the above
87
MT: What is (perhaps) possible
• Computer-aided translation
– For end-users
– For professional translators/localization industry
• Limited functionality
– Specific terminology
• Bootstrap translation automatically
– Human revision and correction (Post-edit)
• Only if
– MT Quality is such that it reduces effort
– The system is fully integrated in the workflow system
88
Conclusions
• FAHQT not possible (yet?)
• MT is really very difficult!
• Several constrained versions do yield usable
technology with state-of-the-art MT
• In some cases: even potentially creates additional
needs for MT and human translation
89
Conclusions
• Statistical MT yields practical relatively quick to
produce systems (but low-quality)
• More research and lots of hard work is needed to
get better systems
• Will probably require hybrid systems (mixed
statistically based/knowledge based); the focus of
research is here (PACO-MT, META-NET,…)
• Needs to be financed by niches where current
state-of-the art MT yields usable technology and
there is a market.
90