Transcript Experiment

國立雲林科技大學
National Yunlin University of Science and Technology
Iterative Translation Disambiguation
for Cross-Language Information
Retrieval
Advisor : Dr. Hsu
Presenter : Yu-San Hsieh
Author
: Christof Monz and Bonnie J. Dorr
2005.SIGIR.520-527
1
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Outline







Motivation
Objective
Approach
Experiment Result
Introduction
Experiment
Conclusions
2
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Motivation

Many words or phrases in one language can
be translated into another language in a
number of way, so translation ambiguity is
very common ,that impacting the
effectiveness of information retrieval.
Elfmeter (Soccer)
Penalty (English)
Strafe (punishment)
3
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Objective

Finding a proper distribution of translation
probabilities that can solve the translation
ambiguity problem.
4
Intelligent Database Systems Lab
europa
europe
N.Y.U.S.T.
I. M.
Approach
gewerbe
geschaeft
Find a proper of translation
probabilities.
Computing Term Weight


─
Initialization Step
─
Iteration Step
─
Normalization Step
handel
union
gewerkschaft
union
trade
ex :
wT1 (ti ,1 | si )
 0.0833  2 *1  0.2 * 2  0.2 * 2
─
All term weights in a vector
─
Iteration Stop
5
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Approach
Measuring
association strength
─
Pointwise mutual information
─
Dice coefficient
─
Log Likelihood ratio
6
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Experiment Result
baseline
Improve
Differences
7
Individual queries (topic)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Introduction

Two techniques for cross-language retrieval
─
─

Translate collection of document into target language
and apply monolingual retrieval
Translate the query into target language and apply
translated query retrieval
Three approach may be used produce the
translations
─
─
─
Machine translation system
Dictionary
Parallel corpus to estimate the probabilities
8
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Introduction

One language translation into another language
in a number ways.
─
Penalty (English) => Elfmeter (soccer) or Strafe
(punishment)
9
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Introduction


A approach can solve the problem of word
selection is to use co-occurrences between
term.
Problem (a larger number of terms)
─
Data-sparseness



Use very large corpora for counting co-occruences frequencies
Use internet search engines
Smoothing
10
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Experiment


Test Data
─ CLEF 2003 English to German bilingual data
─ Choice 56 topic (title, description, narrative)
Morphological Normalization
─ Source-language word (topic) normalized to match in bilingual
dictionary
─ De-compounding:5-grams
─ Assign weights to 5-gram substrings
11
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Experiment


Retrieval Model
─
Lnu.Itc weighting scheme
─
Weighted document similarity
Statistical Significance
─
Bootstrap method


Bootstrap sample
One-tailed significance testing (compare two retrieval method)
12
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Experiment

Found some problem in experiment
─
Individual average precision of Log Likelihood ratio
decreases for a number of query.

Unknown word


The original word from the source language is include in the
target language query.
Example

Women’s Conference Beijing
Result
Not find : Woman
Women
(專有名詞)
normalized
Women
Women
Assign weighted =1
1.Woman control document simliarity
2.Most top-ranked documents contain
Women as the only matching term.
13
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Conclusions


Our approach improve retrieval effectiveness
compare to baseline using bilingual dictionary
lookup.
Experimental result show that Log Likelihood
Ratio has the strong positive impact.
14
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
My opinion

Advantage:


Disadvantage:


It only requires a bilingual dictionary and a
monolingual corpus in the target language.
Unknown word
Apply
15
Intelligent Database Systems Lab