Transcript ppt

Extracting Opinion Topics for Chinese
Opinions using Dependence Grammar
Guang Qiu, Kangmiao Liu, Jiajun Bu*,
Chun Chen, Zhiming Kang
Zhejiang University 浙江大學
SIGKDD Workshop on Data Mining and
Audience Intelligence for Advertising ADKDD
07
Reporter: Chia-Ying Lee
Advisor: Prof. Hsin-Hsi Chen
Introduction

Problem Definition: Determining opinion sentence,
and extracting the topic from a opinion sentence.
Advertisements promoting systems recommend
without considering the sentiment polarity of the
texts.
 A reasonable advertisement should be about that of
a rival or solutions to user’s complain

Kim, Soo-Min and Hovy, Eduard: Opinion is
described as a quadruple including Topic, Holder,
Claim and Sentiment.
 Most of previous work only focuses on sentiment
classification assuming topics are given in prior.

2
Related Work (1/2)
Sentiment Classification
1.Hatzivassiloglou and McKeown, 1997
Pairs of adjectives conjoined by and, or, but, either-or,
or neither-nor
2.Wiebe, 2000
Focuses on subjectivity tagging which distinguishes
opinion sentences
3
Related Work (2/2)
3.Esuli and Sebastiani, 2005
Text classification by glosses.
4.Ku, Liang, and Chen, 2006
Sentiment orientation of sentences can be
concluded from that of words
5.Pang, Lee and Vaithyanathan, 2002
Machine learning methods:Naive Bayes,
Maximum Entropy and SVM.
4
Method - Acquire sentiment words
(1/2)


Assumption:
1. Regard sentences with sentiment words as
opinion ones
2. Topic is assured to exist in these opinion
sentences
Data set of sentiment words:
 WS1: NTUSD
2812 positive words and 8276 negative words

D1: Emotion classification by Bruce and Wiebe
1256 positive blog articles and 1238 negative
5
Method - Acquire sentiment words
(2/2)

D2: Blog search results of Baidu
372 names of products as queries, 24146 snippets
 Manual label 1685 snippets of POS, NEG, NEU
 Select adj. correlated with of the adj. in D1.
 Calculate the probability of each word occurs in each
sentiment category


WS1+D1+D2:
3269 positive words and 9621 negative words
6
Method - Extracting the topics using
rules (1/4)

<ROLE_SENTI, RELA, ROLE_TOPIC>
7
Method - Extracting the topics using
rules (2/4)
1. <VOB, SIBLING, SBV>
2. <DE, GRANDPARENT-SIBLING, SBV>
8
Method - Extracting the topics using
rules (3/4)
3. <ATT, PARENT-SIBLING, SBV>
4.<HED, CHILD, VOB>
9
Method - Extracting the topics using
rules (4/4)
5. <HED, CHILD, SBV>
6. <ADV, SIBLING, VOB>
7. <ANY, NEAREST, ANYNOUN>
10
Experiments and Result(1/2)
 Data:
Blog search results of Baidu






372 queries
22461 snippets unlabeled as POS, NEG, NEU
51661 sentences
Two annotators to annotate topics, and POS,
NEG, NEU
570 sentences
250 for sentiment and 320 for neutral
11
Experiments and Result(2/2)


Opinion sentence
 SVM (using unigram words as the features)
Topic extraction
 218 sentences are correctly extracted out of
250 opinion sentences, with the accuracy of
87.2%.
 Exectly match
12
Conclusion
 Proposed
a rule-based approach to
extracting topics in opinion sentences.
 Employ
a syntactic parsing on sentences
and take advantage of the syntactic roles
of words and their dependency
relationships to extract the topic.
13
Future Work
 Negation
words
 Noise filtering method
 Co-reference resolution
 Enlarge current rules to cover more
situations
14
HIT LTP system

http://ir.hit.edu.cn/demo/ltp/ltp_v2.0.py
15
Thank you!
16