French in the Cross-language Task

Download Report

Transcript French in the Cross-language Task

Cross-Language French-English Question
Answering using the DLT System at CLEF 2003
Aoife O’Gorman
Igal Gabbay
Richard F.E. Sutcliffe
Documents and Linguistic Technology Group
Univeristy of Limerick
D
LT
Outline
• Objectives
• System architecture
• Key components
• Task performance evaluation
• Findings
D
LT
Objectives
• Learn the issues involved in multilingual QA
• Combine the components of our existing
English and French monolingual QA systems
D
LT
System architecture
Query
classification
Answer entity
selection
D
LT
Query translation (Google)
& re-formulation
Named
entity
recognition
Text
retrieval
(dtSearch)
Query classification
• Categories based on translated TREC 2002 queries
• Keyword based classification
what_country
De quel pays le jeu de croquet est-il originaire
De quel nation..?
• Unknown
D
LT
Query translation and re-formulation
• Submitting the French query in its original
form on the Google Language Tools page
• Tokenisation
• Selective removal of stopwords
• Example:
Qui a été élu gouverneur de la California?
Who was elected governor of California?
[ ‘elected’, ‘governor’, ‘California’]
D
LT
Text Retrieval:
Submitting queries to dtSearch
• dtSeach indexed the doc collection based on <DOC>
tags
• Inserting a w/1 connector between two capitalised
words
• Submitting untranslated quotations for exact match
• Inserting an AND connnector between all other terms
(Boolean)
• Limited verb expansion based on common verbs used in
TREC questions
D
LT
Named Entity Recoginition:
General Names
• Captures any instances of general names in
cases where we are not sure what to look for.
• A general_name is defined in our system to
be up to five capitalised terms interspersed
with optional prepositions.
• Examples: Limerick City
University of Limerick
D
LT
Answer entity selection
• highest_scoring
What year was Robert Frost born?
in entity(date,[1,8,7,5],[[],[],[], [],
[1,8,7,5]],[],[],[]), poet target([Robert])
target(Frost]) was target([born]) in San Francisco
• most_frequent
When did “The Simpsons” first appear on television?
When target([The]) target([Simpsons]) was
target(first]) broadcast in
entity(date[1,9,8,9,,[[],[],[],[],[],[1,9,8,9],[],[],])
D
LT
Task performance evaluation
Group
Run Name
MRR
No. of Q. with at
least one right
answer
NIL Questions
strict
lenient
strict
lenient
returned
correct
CSCMU
lumoex031bf
.153
.170
38
42
92
8
lumoex032bf
.131
.149
31
35
91
7
DLTG
dltgex031bf
.115
.120
23
24
119
10
dltgex032bf
.110
.115
22
23
119
10
udemex032bf
.140
.160
38
42
3
1
RALI
D
LT
Adapted from Magnini (2003)
Findings
• Query classification: unexpected formulation
of queries, too few categories
• Translation: problems with names, titles,
- We need better query-specific translation
- Localisation of names/titles
- Possibly limit translation to search terms
D
LT
Findings
• Text retrieval: allow relaxation and more
sophisticated expansion of search queries
• Named entity recognition: find better
alternatives to answer questions of type
Unknown
• Answer entity selection: take into account
distance and density of query terms
• Usability issue: answers may need to be
translated back to French
D
LT