Hidden Markov Model and its application in Pos Tagging

Download Report

Transcript Hidden Markov Model and its application in Pos Tagging

Machine Translation
Dai Xinyu
2006-10-27
1
Outline







Introduction
Architecture of MT
Rule-Based MT vs. Data-Driven MT
Evaluation of MT
Development of MT
MT problems in general
Some Thinking about MT from
recognition
2
Introduction
"I have a text in front of me which is written
in Russian but I am going to pretend that it is
really written in English and that it has been
coded in some strange symbols. All I need do
is strip off the code in order to retrieve the
information contained in the text"
machine translation - the use of computers to translate from one language to another
•The classic acid test for natural language processing.
•Requires capabilities in both interpretation and generation.
•About $10 billion spent annually on human translation.
http://www.google.com/language_tools?hl=en
3
Introdution - MT past and present






mid-1950's - 1965:
 Great expectations
The dark ages for MT:
 Academic research projects
1980's - 1990's:
 Successful specialized applications
1990's:
 Human-machine cooperative translation
1990's - now:
 Statistical-based MT
 Hybrid-strategies MT
Future prospects:
 ???
4
Interest in MT
Commercial interest:
 U.S. has invested in MT for intelligence
purposes
 MT is popular on the web—it is the most
used of Google’s special features
 EU spends more than $1 billion on
translation costs each year.
 (Semi-)automated translation could lead
to huge savings
5
Interest in MT
 Academic interest:
 One of the most challenging problems in NLP
research
 Requires knowledge from many NLP sub-areas,
e.g., lexical semantics, parsing, morphological
analysis, statistical modeling,…
 Being able to establish links between two
languages allows for transferring resources from
one language to another
6
Related Area to MT
 Linguistics
 Computer Science
 AI
 Compile
 Formal Semantics
 …
 Mathematics
 Probability
 Statistics
 …
 Informatics
 Recognition
7
Architecture of MT
-- (Levers of Transfer)
8
Rule-Based MT vs. Data-Driven MT
 Rule-Based MT
 Data-Driven MT
 Example-Based MT
 Statistics-Based MT
9
Rule-Based MT
语言学
语义学
认知科学
人工智能
写规则
规则
自然语言输入
x
翻译系统
翻译结果
10
Rule-Based MT
11
Man, this is so boring.
Hmm, every time he sees
“banco”, he either types
“bank” or “bench” … but if
he sees “banco de…”,
he always types “bank”,
never “bench”…
Translated documents
12
Example-Based MT
 origins: Nagao (1981)
 first motivation: collocations, bilingual
differences of syntactic structures
 basic idea:
 human translators search for analogies (similar
phrases) in previous translations
 MT should seek matching fragment in bilingual
database, extract translations
 aim to have less complex dictionaries,
grammars, and procedures
 improved generation (using actual
examples of TL sentences)
13
EBMT still going




Bi-lingual corpus Collection
Store
Searching and matching
…
14
Statistical MT Basics
 Based on assumption that translations
observed statistical regularities
 origins: Warren Weaver (1949)
 Shannon’s information theory
 core process is the probabilistic ‘translation
model’ taking SL words or phrases as input,
and producing TL words or phrases as
output
 succeeding stage involves a probabilistic
‘language model’ which synthesizes TL
words as ‘meaningful’ TL sentences
15
Statistical MT
统计学习
建立模型
自然语言输入
x1 x2  xn
学习系统
预测
自然语言输入
x n 1
概率模型
预测系统
ˆp( xn 1 )
16
Statistical MT schema
17
Statistical MT processes




Bilingual corpora: original and translation
little or no linguistic ‘knowledge’, based on word cooccurrences in SL and TL texts (of a corpus), relative
positions of words within sentences, length of sentences
Alignment: sentences aligned statistically (according to
sentence length and position)
Decoding: compute probability that a TL string is the
translation of a SL string (‘translation model’), based on:




frequency of co-occurrence in aligned texts of corpus
position of SL words in SL string
Adjustment: compute probability that a TL string is a valid
TL sentence (based on a ‘language model’ of allowable
bigrams and trigrams)
search for TL string that maximizes these probabilities
argmaxeP(e/f) = argmaxeP (f/e) P (e)
18
Language Modeling
 Determines the probability of some English
l
sequence e1 of length l
 P(e) is normally approximated as:
i1
P(e )  P(e1 )P(e2 | e1 ) P(ei | eim
)
l
1
l
i 3
m is size of the context, i.e. number
where

of previous words that are considered,
 m=1, bi-gram language model
m=2, tri-gram language model
19
Translation Modeling
 Determines the probability that the foreign
word f is a translation of the English word e
 How to compute P(f | e) from a parallel
corpus?
 Statistical approaches rely on the cooccurrence of e and f in the parallel data: If
e and f tend to co-occur in parallel sentence
pairs, they are likely to be translations of
one another
20
SMT issues





ignores previous MT research (new start, new ‘paradigm’)
 basically ‘direct’ approach:
 replaces SL word by most probable TL word,
 reorders TL words
 decoding is effectively kind of ‘back translation’
originally wholly word-based (IBM ‘Candide’ 1988) ; now predominantly
phrase-based (i.e. alignment of word groups); some research on
syntax-based
mathematically simple, but huge amount of training (large databases)
problems for SMT:

translation is not just selecting the most frequent ‘equivalent’
(wider context)

no quality control of corpora

lack of monolingual data for some languages

insufficient bilingual data (Internet as resource)

lack of structure information of language
merit of SMT: evaluation as integral process of system development
21
Rule-Based MT & SMT
 SMT black box: no way of finding how it works in
particular cases, why it succeeds sometimes and not
others
 RBMT: rules and procedures can be examined
 RBMT and SMT are apparent polar opposites, but
gradually ‘rules’ incorporated in SMT models
 first, morphology (even in versions of first IBM model)
 then, ‘phrases’ (with some similarity to linguistic
phrases)
 now also, syntactic parsing
22
Rule-Based MT & SMT
 Comparison from following perspectives:






Theory background
Knowledge expression
Knowledge discovery
Robust
Extension
Development Cycle
23
Evaluation of MT
 Manual:
 Precise / fluency / integrality
 信达雅
 Automatically evaluation:
 BLEU: percentage of word sequences
(n-grams) occurring in reference texts
 NIST
24
Development of MT - MT System
25
MT Development - Research
Shallow/ Simple
Word-based
only
Electronic
dictionaries
Phrase tables
Knowledge
Acquisition
Hand-built by
Strategy
experts
Hand-built by
non-experts
All manual
Original direct
approach
Typical transfer
system
Classic
interlingual
system
Original statistical
MT
Example-based
MT
Learn from
annotated data
Learn from unannotated data
Fully automated
Syntactic
Constituent
Structure
Semantic
analysis
New Research
Goes Here!
Interlingua
Knowledge
Deep/ Complex Representation
Strategy
26
MT problems in general
 Characters of language
 Ambiguous
 Dynamic
 Flexible
 Knowledge
 How to express
 How to discovery
 How to use
27
Some Thinking about MT from
recognition
 Human Cerebra




Memory
Progress - Learning
Model
Pattern
 Translation by human…
 Translation by machine…
28
Further Reading
 Arturo Trujillo, Translation Engines: Techniques for Machine Translation,
Springer-Verlag London Limited 1999
 P.F. Brown, et al., A Statistical Approach to MT, Computational Linguistics,
1990,16(2)
 P.F. Brown, et al., The Mathematics of Statistical Machine Translation:
Parameter Estimation, Computational Linguistics, 1993, 19(2)
 Bonnie J. Dorr, et al, Survey of Current Paradigms in Machine Translation
 Makoto Nagao, A Framework of a Mechanical Translation between Japanese
and English by Analog Principle, In A. Elithorn and R. Banerji(Eds.),
Artificial and Human Intelligence. NATO Publications, 1984
 Hutchins WJ, Machine Translation: Past, Present, Future. Chichester: Ellis
Horwood, 1986
 Daniel Jurafsky & James H. Martin, Speech and Language Processing,
Prentice-Hall, 2000
 Christopher D. Manning & Hinrich Schutze, Foundations of Statistical
Natural Langugae Processing, Massachusetts Institute of Technology, 1999
 James Allen, Natural Language Understanding, The Benjamin/Cummings
Publishing Company, Inc. 1987
29