投影片 1 - 成功大學myweb個人網頁空間服務說明

Download Report

Transcript 投影片 1 - 成功大學myweb個人網頁空間服務說明

Learn
Question Focus and Dependency
Relations from Web Search Results
for Question Classification
Wen-Hsiang Lu (盧文祥)
[email protected]
Web Mining and Multilingual Knowledge System Laboratory,
Department of Computer Science and Information Engineering,
National Cheng Kung University
2015/7/20
WMMKS Lab
1
Research Interest
Web Mining
Natural
Language
Processing
2015/7/20
WMMKS Lab
Information
Retrieval
2
Research Issues
 Unknown Term Translation & Cross-Language Information
Retrieval
 A Multi-Stage Translation Extraction Method for Unknown Terms
Using Web Search Results
 Question Answering & Machine Translation
 Using Web Search Results to Learn Question Focus and
Dependency Relations for Question Classification
 Using Phrase and Fluency to Improve Statistical Machine Translation
 User Modeling & Web Search
 Learning Question Structure based on Website Link Structure to
Improve Natural Language Search
 Improving Short-Query Web Search based on User Goal Identification
 Cross-Language Medical Information Retrieval
 MMODE: http://mmode.no-ip.org/
2015/7/20
WMMKS Lab
3
雅各氏症候群
2015/7/20
WMMKS Lab
4
Outline
 Introduction
 Related Work
 Approach
 Experiment
 Conclusion
 Future Work
2015/7/20
WMMKS Lab
5
Outline
 Introduction
 Related Work
 Approach
 Experiment
 Conclusion
 Future Work
2015/7/20
WMMKS Lab
6
Question Answering (QA) System
1. Question Analysis: Question Classification, Keywords Extraction.
2. Document Retrieval: Retrieve related documents.
3. Answer Extraction: Extract a exact answer.
2015/7/20
WMMKS Lab
7
Motivation (1/3)
 Importance of Question Classification
 Dan Moldovan proposed a report [Dan Moldovan 2000]
2015/7/20
WMMKS Lab
8
Motivation (2/3)
 Rule-based Question Classification
 Manual and unrealistic method.
 Machine Learning-based Question
Classification
 Support Vector Machine (SVM)
. Need a large number of training data.
. Too many features may be noise.
2015/7/20
WMMKS Lab
9
Motivation (3/3)
 A new method for question classification.
 Observe some useful features of question.
 Solve the problem of insufficient training data.
2015/7/20
WMMKS Lab
10
Idea of Approach (1/4)
 Many questions have ambiguous question words
 Importance of Question Focus (QF).
 Use QF identification for question classification.
2015/7/20
WMMKS Lab
11
Idea of Approach (2/4)
 If we do not have enough information to identify
the type of QF.
Question
QF
Dependency Verb
Dependency Quantifier
Dependency Noun
Question Type
: Dependency Features
: Question Type
: (Unigram) Semantic Dependency Relation
: (Bigram) Semantic Dependency Relation
2015/7/20
WMMKS Lab
12
Idea of Approach (3/4)
 Example
2015/7/20
WMMKS Lab
13
Idea of Approach (4/4)
 Use QF and dependency features to classify
questions.
 Learning QF and other dependency features
from Web.
 Propose a Semantic Dependency Relation Model
(SDRM).
2015/7/20
WMMKS Lab
14
Outline
 Introduction
 Related Work
 Approach
 Experiment
 Conclusion
 Future Work
2015/7/20
WMMKS Lab
15
Rule-based Question Classification
 [Richard F. E. Sutcliffe 2005][Kui-Lam Kwok
2005][Ellen Riloff 2000]
5W(Who, When, Where, What, Why)
Who → Person.
When → Time.
Where → Location.
What → Difficult type.
Why → Reason.
2015/7/20
WMMKS Lab
16
Machine Learning-based
Question Classification
Several methods based on SVM.
[Zhang, 2003; Suzuki, 2003; Day, 2005]
Question
2015/7/20
KDAG Kernel
Feature Vector
WMMKS Lab
SVM
Question Type
17
Web-based Question Classification
Use a Web search engine to identify question type.
[Solorio, 2004]
 “Who is the President of the French Republic?”
2015/7/20
WMMKS Lab
18
Statistics-based Question Classification
 Language Model for Question Classification
[Li, 2002]
Too many features may be noise.
2015/7/20
WMMKS Lab
19
Outline
 Introduction
 Related Work
 Approach
 Experiment
 Conclusion
 Future Work
2015/7/20
WMMKS Lab
20
Architecture of Question Classification
2015/7/20
WMMKS Lab
21
Question Type
 6 types of questions
 Person
 Location
 Organization
 Number
 Date
 Artifact
2015/7/20
WMMKS Lab
22
Basic Classification Rules
 We define 17 basic rules for simple questions.
2015/7/20
WMMKS Lab
23
Learning Semantic
Dependency Features (1/3)
Architecture for Learning Dependency Features
 Extracting Dependency Features Algorithm
2015/7/20
WMMKS Lab
24
Learning Semantic
Dependency Features (2/3)
 Architecture for Learning Dependency Features
2015/7/20
WMMKS Lab
25
Learning Semantic
Dependency Features (3/3)
 Extracting Dependency Features Algorithm
..
2015/7/20
WMMKS Lab
26
Question Focus
Identification Algorithm (1/2)
 Algorithm
2015/7/20
WMMKS Lab
27
Question Focus
Identification Algorithm (2/2)
 Example
2015/7/20
WMMKS Lab
28
Semantic Dependency
Relation Model (SDMR) (1/12)
 Unigram-SDRM
 Bigram-SDRM
2015/7/20
WMMKS Lab
29
Semantic Dependency
Relation Model (SDMR) (2/12)
Unigram-SDRM
Q
Question
P(C|Q)
C
Question Type
 P(C|Q) need many questions to train.
2015/7/20
WMMKS Lab
30
Semantic Dependency
Relation Model (SDMR) (3/12)
Unigram-SDRM
C
Question Type
P(DC|C)
DC
P(Q|DC)
Web search result
Q
Question
 P(DC|C): Collect related search results by every type.
 P(Q|DC): Use DC to determine the question type.
2015/7/20
WMMKS Lab
31
Semantic Dependency
Relation Model (SDRM) (4/12)
 Unigram-SDRM
2015/7/20
WMMKS Lab
32
Semantic Dependency
Relation Model (SDRM) (5/12)
Unigram-SDRM
 Q={QF,QD}, QD={DV,DQ,DN}.
2015/7/20
WMMKS Lab
DV : Dependency Verb
DQ: Dependency Quantifier
DN: Dependency Noun
33
Semantic Dependency
Relation Model (SDRM) (6/12)
Unigram-SDRM
 DV={ dv1, dv2,⋯,dvi}, DQ={ dq1, dq2,⋯, dqj},
DN={ dn1, dn2,⋯, dnk}.
2015/7/20
WMMKS Lab
34
Semantic Dependency
Relation Model (SDRM) (7/12)
 Parameter Estimation of Unigram-SDRM
 P(DC|C)
 P(QF |DC), P(dv|DC), P(dq|DC), P(dn|DC)
 N(QF): The number of occurrence of the QF in Q.
 NQF(DC): Total number of all QF collected from search
results.
2015/7/20
WMMKS Lab
35
Semantic Dependency
Relation Model (SDRM) (8/12)
 Parameter Estimation of Unigram-SDRM
2015/7/20
WMMKS Lab
36
Semantic Dependency
Relation Model (SDRM) (9/12)
Bigram-SDRM
2015/7/20
WMMKS Lab
37
Semantic Dependency
Relation Model (SDRM) (10/12)
Bigram-SDRM
2015/7/20
WMMKS Lab
38
Semantic Dependency
Relation Model (SDRM) (11/12)
 Parameter Estimation of Bigram-SDRM
 P(DC|C): The same as Unigram-SDRM
 P(QF|DC): The same as Unigram-SDRM
 P(dV|QF,DC), P(dQ|QF,DC), P(dN|QF,DC)
 Nsentence(dv,QF): The number of sentence containing
dv and QF.
 Nsentence(QF): Total number of sentence containing QF.
2015/7/20
WMMKS Lab
39
Semantic Dependency
Relation Model (SDRM) (12/12)
 Parameter Estimation of Bigram-SDRM
2015/7/20
WMMKS Lab
40
Outline
 Introduction
 Related Work
 Approach
 Experiment
 Conclusion
 Future Work
2015/7/20
WMMKS Lab
41
Experiment
 SDRM Performance Evaluation
. Unigram-SDRM v.s. Bigram-SDRM
. Combination with different weights
 SDRM v.s. Language Model
. Use questions as training data
. Use Web as training data
. Questions v.s. Web
2015/7/20
WMMKS Lab
42
Experimental Data
 Collect questions from NTCIR-5 CLQA.
 4-fold cross-validation.
2015/7/20
WMMKS Lab
43
Unigram-SDRM v.s. Bigram-SDRM
 Result
2015/7/20
WMMKS Lab
44
Unigram-SDRM v.s. Bigram-SDRM (2/2)
 Example
 For unigram: “人”,”創下”,”駕駛” are trained successfully.
 For bigram: “人_創下” are not trained successfully.
2015/7/20
WMMKS Lab
45
Combination with different weight (1/3)
 Different weights for different features
 α: The weight of QF, β: The weight of dV,
 γ: The weight of dQ, δ: The weight of dN.
2015/7/20
WMMKS Lab
46
Combination with different weight (2/3)
 Comparison of 4 dependency features
2015/7/20
WMMKS Lab
47
Combination with different weight (3/3)
 16 experiments
Best weighting: 0.23QF, 0.29DV, 0.48DQ.
To solve some problem about mathematics.
 Example: QF and DV
α: The weight of QF
β: The weight of DV.
α=(1-0.77)/[(1-0.77)+(1-0.71)]
β=(1-0.71)/ [(1-0.77)+(1-0.71)]
2015/7/20
WMMKS Lab
48
Use questions as training data (1/2)
 Result
2015/7/20
WMMKS Lab
49
Use questions as training data (2/2)
 Example
 For LM: “網球選手”,”選手為” are not trained successfully.
 For SDRM: “選手”, ”奪得” are trained successfully.
2015/7/20
WMMKS Lab
50
Use Web search results as
training data (1/2)
 Result
2015/7/20
WMMKS Lab
51
Use Web search results as
training data (2/2)
 Example
 For LM: “何國” are not trained successfully.
 For SDRM: “國”, ”設於” are trained successfully.
2015/7/20
WMMKS Lab
52
Question v.s. Web (1/3)
 Result
 Trained Question: LM can train QF of the question.
 Untrained Question: LM can’t train QF of the question.
2015/7/20
WMMKS Lab
53
Question v.s. Web (2/3)
 Example of trained question
 For LM: “何地” are trained successfully.
 For SDRM: “地”, ”舉行” are trained successfully, but these
terms are also trained on other types.
2015/7/20
WMMKS Lab
54
Question vs. Web (3/3)
 Example of untrained question
 For LM: “女星”, ”獲得” are not trained successfully.
 For SDRM: “女星”, ”獲得” are trained successfully.
2015/7/20
WMMKS Lab
55
Conclusion
 Discussion
 We need to enhance our learning method and performance.
 We need better smoothing method.
 Conclusion
 We propose a new model SDRM which uses
question focus and dependency features for question
classification.
 Use Web search results as training data to solve the
problem of insufficient training data.
2015/7/20
WMMKS Lab
56
Future Work
Further works in the future
 Enhance the performance of learning method.
 Consider the importance of features in the question.
Question focus and dependency features may be
used for other process steps of question answer systems.
2015/7/20
WMMKS Lab
57
Thank You
2015/7/20
WMMKS Lab
58