Hidden Markov Model and its application in Pos Tagging
Download
Report
Transcript Hidden Markov Model and its application in Pos Tagging
Machine Translation
Dai Xinyu
2006-10-27
1
Outline
Introduction
Architecture of MT
Rule-Based MT vs. Data-Driven MT
Evaluation of MT
Development of MT
MT problems in general
Some Thinking about MT from
recognition
2
Introduction
"I have a text in front of me which is written
in Russian but I am going to pretend that it is
really written in English and that it has been
coded in some strange symbols. All I need do
is strip off the code in order to retrieve the
information contained in the text"
machine translation - the use of computers to translate from one language to another
•The classic acid test for natural language processing.
•Requires capabilities in both interpretation and generation.
•About $10 billion spent annually on human translation.
http://www.google.com/language_tools?hl=en
3
Introdution - MT past and present
mid-1950's - 1965:
Great expectations
The dark ages for MT:
Academic research projects
1980's - 1990's:
Successful specialized applications
1990's:
Human-machine cooperative translation
1990's - now:
Statistical-based MT
Hybrid-strategies MT
Future prospects:
???
4
Interest in MT
Commercial interest:
U.S. has invested in MT for intelligence
purposes
MT is popular on the web—it is the most
used of Google’s special features
EU spends more than $1 billion on
translation costs each year.
(Semi-)automated translation could lead
to huge savings
5
Interest in MT
Academic interest:
One of the most challenging problems in NLP
research
Requires knowledge from many NLP sub-areas,
e.g., lexical semantics, parsing, morphological
analysis, statistical modeling,…
Being able to establish links between two
languages allows for transferring resources from
one language to another
6
Related Area to MT
Linguistics
Computer Science
AI
Compile
Formal Semantics
…
Mathematics
Probability
Statistics
…
Informatics
Recognition
7
Architecture of MT
-- (Levers of Transfer)
8
Rule-Based MT vs. Data-Driven MT
Rule-Based MT
Data-Driven MT
Example-Based MT
Statistics-Based MT
9
Rule-Based MT
语言学
语义学
认知科学
人工智能
写规则
规则
自然语言输入
x
翻译系统
翻译结果
10
Rule-Based MT
11
Man, this is so boring.
Hmm, every time he sees
“banco”, he either types
“bank” or “bench” … but if
he sees “banco de…”,
he always types “bank”,
never “bench”…
Translated documents
12
Example-Based MT
origins: Nagao (1981)
first motivation: collocations, bilingual
differences of syntactic structures
basic idea:
human translators search for analogies (similar
phrases) in previous translations
MT should seek matching fragment in bilingual
database, extract translations
aim to have less complex dictionaries,
grammars, and procedures
improved generation (using actual
examples of TL sentences)
13
EBMT still going
Bi-lingual corpus Collection
Store
Searching and matching
…
14
Statistical MT Basics
Based on assumption that translations
observed statistical regularities
origins: Warren Weaver (1949)
Shannon’s information theory
core process is the probabilistic ‘translation
model’ taking SL words or phrases as input,
and producing TL words or phrases as
output
succeeding stage involves a probabilistic
‘language model’ which synthesizes TL
words as ‘meaningful’ TL sentences
15
Statistical MT
统计学习
建立模型
自然语言输入
x1 x2 xn
学习系统
预测
自然语言输入
x n 1
概率模型
预测系统
ˆp( xn 1 )
16
Statistical MT schema
17
Statistical MT processes
Bilingual corpora: original and translation
little or no linguistic ‘knowledge’, based on word cooccurrences in SL and TL texts (of a corpus), relative
positions of words within sentences, length of sentences
Alignment: sentences aligned statistically (according to
sentence length and position)
Decoding: compute probability that a TL string is the
translation of a SL string (‘translation model’), based on:
frequency of co-occurrence in aligned texts of corpus
position of SL words in SL string
Adjustment: compute probability that a TL string is a valid
TL sentence (based on a ‘language model’ of allowable
bigrams and trigrams)
search for TL string that maximizes these probabilities
argmaxeP(e/f) = argmaxeP (f/e) P (e)
18
Language Modeling
Determines the probability of some English
l
sequence e1 of length l
P(e) is normally approximated as:
i1
P(e ) P(e1 )P(e2 | e1 ) P(ei | eim
)
l
1
l
i 3
m is size of the context, i.e. number
where
of previous words that are considered,
m=1, bi-gram language model
m=2, tri-gram language model
19
Translation Modeling
Determines the probability that the foreign
word f is a translation of the English word e
How to compute P(f | e) from a parallel
corpus?
Statistical approaches rely on the cooccurrence of e and f in the parallel data: If
e and f tend to co-occur in parallel sentence
pairs, they are likely to be translations of
one another
20
SMT issues
ignores previous MT research (new start, new ‘paradigm’)
basically ‘direct’ approach:
replaces SL word by most probable TL word,
reorders TL words
decoding is effectively kind of ‘back translation’
originally wholly word-based (IBM ‘Candide’ 1988) ; now predominantly
phrase-based (i.e. alignment of word groups); some research on
syntax-based
mathematically simple, but huge amount of training (large databases)
problems for SMT:
translation is not just selecting the most frequent ‘equivalent’
(wider context)
no quality control of corpora
lack of monolingual data for some languages
insufficient bilingual data (Internet as resource)
lack of structure information of language
merit of SMT: evaluation as integral process of system development
21
Rule-Based MT & SMT
SMT black box: no way of finding how it works in
particular cases, why it succeeds sometimes and not
others
RBMT: rules and procedures can be examined
RBMT and SMT are apparent polar opposites, but
gradually ‘rules’ incorporated in SMT models
first, morphology (even in versions of first IBM model)
then, ‘phrases’ (with some similarity to linguistic
phrases)
now also, syntactic parsing
22
Rule-Based MT & SMT
Comparison from following perspectives:
Theory background
Knowledge expression
Knowledge discovery
Robust
Extension
Development Cycle
23
Evaluation of MT
Manual:
Precise / fluency / integrality
信达雅
Automatically evaluation:
BLEU: percentage of word sequences
(n-grams) occurring in reference texts
NIST
24
Development of MT - MT System
25
MT Development - Research
Shallow/ Simple
Word-based
only
Electronic
dictionaries
Phrase tables
Knowledge
Acquisition
Hand-built by
Strategy
experts
Hand-built by
non-experts
All manual
Original direct
approach
Typical transfer
system
Classic
interlingual
system
Original statistical
MT
Example-based
MT
Learn from
annotated data
Learn from unannotated data
Fully automated
Syntactic
Constituent
Structure
Semantic
analysis
New Research
Goes Here!
Interlingua
Knowledge
Deep/ Complex Representation
Strategy
26
MT problems in general
Characters of language
Ambiguous
Dynamic
Flexible
Knowledge
How to express
How to discovery
How to use
27
Some Thinking about MT from
recognition
Human Cerebra
Memory
Progress - Learning
Model
Pattern
Translation by human…
Translation by machine…
28
Further Reading
Arturo Trujillo, Translation Engines: Techniques for Machine Translation,
Springer-Verlag London Limited 1999
P.F. Brown, et al., A Statistical Approach to MT, Computational Linguistics,
1990,16(2)
P.F. Brown, et al., The Mathematics of Statistical Machine Translation:
Parameter Estimation, Computational Linguistics, 1993, 19(2)
Bonnie J. Dorr, et al, Survey of Current Paradigms in Machine Translation
Makoto Nagao, A Framework of a Mechanical Translation between Japanese
and English by Analog Principle, In A. Elithorn and R. Banerji(Eds.),
Artificial and Human Intelligence. NATO Publications, 1984
Hutchins WJ, Machine Translation: Past, Present, Future. Chichester: Ellis
Horwood, 1986
Daniel Jurafsky & James H. Martin, Speech and Language Processing,
Prentice-Hall, 2000
Christopher D. Manning & Hinrich Schutze, Foundations of Statistical
Natural Langugae Processing, Massachusetts Institute of Technology, 1999
James Allen, Natural Language Understanding, The Benjamin/Cummings
Publishing Company, Inc. 1987
29