投影片 1 - 成功大學myweb個人網頁空間服務說明
Download
Report
Transcript 投影片 1 - 成功大學myweb個人網頁空間服務說明
Learn
Question Focus and Dependency
Relations from Web Search Results
for Question Classification
Wen-Hsiang Lu (盧文祥)
[email protected]
Web Mining and Multilingual Knowledge System Laboratory,
Department of Computer Science and Information Engineering,
National Cheng Kung University
2015/7/20
WMMKS Lab
1
Research Interest
Web Mining
Natural
Language
Processing
2015/7/20
WMMKS Lab
Information
Retrieval
2
Research Issues
Unknown Term Translation & Cross-Language Information
Retrieval
A Multi-Stage Translation Extraction Method for Unknown Terms
Using Web Search Results
Question Answering & Machine Translation
Using Web Search Results to Learn Question Focus and
Dependency Relations for Question Classification
Using Phrase and Fluency to Improve Statistical Machine Translation
User Modeling & Web Search
Learning Question Structure based on Website Link Structure to
Improve Natural Language Search
Improving Short-Query Web Search based on User Goal Identification
Cross-Language Medical Information Retrieval
MMODE: http://mmode.no-ip.org/
2015/7/20
WMMKS Lab
3
雅各氏症候群
2015/7/20
WMMKS Lab
4
Outline
Introduction
Related Work
Approach
Experiment
Conclusion
Future Work
2015/7/20
WMMKS Lab
5
Outline
Introduction
Related Work
Approach
Experiment
Conclusion
Future Work
2015/7/20
WMMKS Lab
6
Question Answering (QA) System
1. Question Analysis: Question Classification, Keywords Extraction.
2. Document Retrieval: Retrieve related documents.
3. Answer Extraction: Extract a exact answer.
2015/7/20
WMMKS Lab
7
Motivation (1/3)
Importance of Question Classification
Dan Moldovan proposed a report [Dan Moldovan 2000]
2015/7/20
WMMKS Lab
8
Motivation (2/3)
Rule-based Question Classification
Manual and unrealistic method.
Machine Learning-based Question
Classification
Support Vector Machine (SVM)
. Need a large number of training data.
. Too many features may be noise.
2015/7/20
WMMKS Lab
9
Motivation (3/3)
A new method for question classification.
Observe some useful features of question.
Solve the problem of insufficient training data.
2015/7/20
WMMKS Lab
10
Idea of Approach (1/4)
Many questions have ambiguous question words
Importance of Question Focus (QF).
Use QF identification for question classification.
2015/7/20
WMMKS Lab
11
Idea of Approach (2/4)
If we do not have enough information to identify
the type of QF.
Question
QF
Dependency Verb
Dependency Quantifier
Dependency Noun
Question Type
: Dependency Features
: Question Type
: (Unigram) Semantic Dependency Relation
: (Bigram) Semantic Dependency Relation
2015/7/20
WMMKS Lab
12
Idea of Approach (3/4)
Example
2015/7/20
WMMKS Lab
13
Idea of Approach (4/4)
Use QF and dependency features to classify
questions.
Learning QF and other dependency features
from Web.
Propose a Semantic Dependency Relation Model
(SDRM).
2015/7/20
WMMKS Lab
14
Outline
Introduction
Related Work
Approach
Experiment
Conclusion
Future Work
2015/7/20
WMMKS Lab
15
Rule-based Question Classification
[Richard F. E. Sutcliffe 2005][Kui-Lam Kwok
2005][Ellen Riloff 2000]
5W(Who, When, Where, What, Why)
Who → Person.
When → Time.
Where → Location.
What → Difficult type.
Why → Reason.
2015/7/20
WMMKS Lab
16
Machine Learning-based
Question Classification
Several methods based on SVM.
[Zhang, 2003; Suzuki, 2003; Day, 2005]
Question
2015/7/20
KDAG Kernel
Feature Vector
WMMKS Lab
SVM
Question Type
17
Web-based Question Classification
Use a Web search engine to identify question type.
[Solorio, 2004]
“Who is the President of the French Republic?”
2015/7/20
WMMKS Lab
18
Statistics-based Question Classification
Language Model for Question Classification
[Li, 2002]
Too many features may be noise.
2015/7/20
WMMKS Lab
19
Outline
Introduction
Related Work
Approach
Experiment
Conclusion
Future Work
2015/7/20
WMMKS Lab
20
Architecture of Question Classification
2015/7/20
WMMKS Lab
21
Question Type
6 types of questions
Person
Location
Organization
Number
Date
Artifact
2015/7/20
WMMKS Lab
22
Basic Classification Rules
We define 17 basic rules for simple questions.
2015/7/20
WMMKS Lab
23
Learning Semantic
Dependency Features (1/3)
Architecture for Learning Dependency Features
Extracting Dependency Features Algorithm
2015/7/20
WMMKS Lab
24
Learning Semantic
Dependency Features (2/3)
Architecture for Learning Dependency Features
2015/7/20
WMMKS Lab
25
Learning Semantic
Dependency Features (3/3)
Extracting Dependency Features Algorithm
..
2015/7/20
WMMKS Lab
26
Question Focus
Identification Algorithm (1/2)
Algorithm
2015/7/20
WMMKS Lab
27
Question Focus
Identification Algorithm (2/2)
Example
2015/7/20
WMMKS Lab
28
Semantic Dependency
Relation Model (SDMR) (1/12)
Unigram-SDRM
Bigram-SDRM
2015/7/20
WMMKS Lab
29
Semantic Dependency
Relation Model (SDMR) (2/12)
Unigram-SDRM
Q
Question
P(C|Q)
C
Question Type
P(C|Q) need many questions to train.
2015/7/20
WMMKS Lab
30
Semantic Dependency
Relation Model (SDMR) (3/12)
Unigram-SDRM
C
Question Type
P(DC|C)
DC
P(Q|DC)
Web search result
Q
Question
P(DC|C): Collect related search results by every type.
P(Q|DC): Use DC to determine the question type.
2015/7/20
WMMKS Lab
31
Semantic Dependency
Relation Model (SDRM) (4/12)
Unigram-SDRM
2015/7/20
WMMKS Lab
32
Semantic Dependency
Relation Model (SDRM) (5/12)
Unigram-SDRM
Q={QF,QD}, QD={DV,DQ,DN}.
2015/7/20
WMMKS Lab
DV : Dependency Verb
DQ: Dependency Quantifier
DN: Dependency Noun
33
Semantic Dependency
Relation Model (SDRM) (6/12)
Unigram-SDRM
DV={ dv1, dv2,⋯,dvi}, DQ={ dq1, dq2,⋯, dqj},
DN={ dn1, dn2,⋯, dnk}.
2015/7/20
WMMKS Lab
34
Semantic Dependency
Relation Model (SDRM) (7/12)
Parameter Estimation of Unigram-SDRM
P(DC|C)
P(QF |DC), P(dv|DC), P(dq|DC), P(dn|DC)
N(QF): The number of occurrence of the QF in Q.
NQF(DC): Total number of all QF collected from search
results.
2015/7/20
WMMKS Lab
35
Semantic Dependency
Relation Model (SDRM) (8/12)
Parameter Estimation of Unigram-SDRM
2015/7/20
WMMKS Lab
36
Semantic Dependency
Relation Model (SDRM) (9/12)
Bigram-SDRM
2015/7/20
WMMKS Lab
37
Semantic Dependency
Relation Model (SDRM) (10/12)
Bigram-SDRM
2015/7/20
WMMKS Lab
38
Semantic Dependency
Relation Model (SDRM) (11/12)
Parameter Estimation of Bigram-SDRM
P(DC|C): The same as Unigram-SDRM
P(QF|DC): The same as Unigram-SDRM
P(dV|QF,DC), P(dQ|QF,DC), P(dN|QF,DC)
Nsentence(dv,QF): The number of sentence containing
dv and QF.
Nsentence(QF): Total number of sentence containing QF.
2015/7/20
WMMKS Lab
39
Semantic Dependency
Relation Model (SDRM) (12/12)
Parameter Estimation of Bigram-SDRM
2015/7/20
WMMKS Lab
40
Outline
Introduction
Related Work
Approach
Experiment
Conclusion
Future Work
2015/7/20
WMMKS Lab
41
Experiment
SDRM Performance Evaluation
. Unigram-SDRM v.s. Bigram-SDRM
. Combination with different weights
SDRM v.s. Language Model
. Use questions as training data
. Use Web as training data
. Questions v.s. Web
2015/7/20
WMMKS Lab
42
Experimental Data
Collect questions from NTCIR-5 CLQA.
4-fold cross-validation.
2015/7/20
WMMKS Lab
43
Unigram-SDRM v.s. Bigram-SDRM
Result
2015/7/20
WMMKS Lab
44
Unigram-SDRM v.s. Bigram-SDRM (2/2)
Example
For unigram: “人”,”創下”,”駕駛” are trained successfully.
For bigram: “人_創下” are not trained successfully.
2015/7/20
WMMKS Lab
45
Combination with different weight (1/3)
Different weights for different features
α: The weight of QF, β: The weight of dV,
γ: The weight of dQ, δ: The weight of dN.
2015/7/20
WMMKS Lab
46
Combination with different weight (2/3)
Comparison of 4 dependency features
2015/7/20
WMMKS Lab
47
Combination with different weight (3/3)
16 experiments
Best weighting: 0.23QF, 0.29DV, 0.48DQ.
To solve some problem about mathematics.
Example: QF and DV
α: The weight of QF
β: The weight of DV.
α=(1-0.77)/[(1-0.77)+(1-0.71)]
β=(1-0.71)/ [(1-0.77)+(1-0.71)]
2015/7/20
WMMKS Lab
48
Use questions as training data (1/2)
Result
2015/7/20
WMMKS Lab
49
Use questions as training data (2/2)
Example
For LM: “網球選手”,”選手為” are not trained successfully.
For SDRM: “選手”, ”奪得” are trained successfully.
2015/7/20
WMMKS Lab
50
Use Web search results as
training data (1/2)
Result
2015/7/20
WMMKS Lab
51
Use Web search results as
training data (2/2)
Example
For LM: “何國” are not trained successfully.
For SDRM: “國”, ”設於” are trained successfully.
2015/7/20
WMMKS Lab
52
Question v.s. Web (1/3)
Result
Trained Question: LM can train QF of the question.
Untrained Question: LM can’t train QF of the question.
2015/7/20
WMMKS Lab
53
Question v.s. Web (2/3)
Example of trained question
For LM: “何地” are trained successfully.
For SDRM: “地”, ”舉行” are trained successfully, but these
terms are also trained on other types.
2015/7/20
WMMKS Lab
54
Question vs. Web (3/3)
Example of untrained question
For LM: “女星”, ”獲得” are not trained successfully.
For SDRM: “女星”, ”獲得” are trained successfully.
2015/7/20
WMMKS Lab
55
Conclusion
Discussion
We need to enhance our learning method and performance.
We need better smoothing method.
Conclusion
We propose a new model SDRM which uses
question focus and dependency features for question
classification.
Use Web search results as training data to solve the
problem of insufficient training data.
2015/7/20
WMMKS Lab
56
Future Work
Further works in the future
Enhance the performance of learning method.
Consider the importance of features in the question.
Question focus and dependency features may be
used for other process steps of question answer systems.
2015/7/20
WMMKS Lab
57
Thank You
2015/7/20
WMMKS Lab
58