Transcript ppt
Extracting Opinion Topics for Chinese
Opinions using Dependence Grammar
Guang Qiu, Kangmiao Liu, Jiajun Bu*,
Chun Chen, Zhiming Kang
Zhejiang University 浙江大學
SIGKDD Workshop on Data Mining and
Audience Intelligence for Advertising ADKDD
07
Reporter: Chia-Ying Lee
Advisor: Prof. Hsin-Hsi Chen
Introduction
Problem Definition: Determining opinion sentence,
and extracting the topic from a opinion sentence.
Advertisements promoting systems recommend
without considering the sentiment polarity of the
texts.
A reasonable advertisement should be about that of
a rival or solutions to user’s complain
Kim, Soo-Min and Hovy, Eduard: Opinion is
described as a quadruple including Topic, Holder,
Claim and Sentiment.
Most of previous work only focuses on sentiment
classification assuming topics are given in prior.
2
Related Work (1/2)
Sentiment Classification
1.Hatzivassiloglou and McKeown, 1997
Pairs of adjectives conjoined by and, or, but, either-or,
or neither-nor
2.Wiebe, 2000
Focuses on subjectivity tagging which distinguishes
opinion sentences
3
Related Work (2/2)
3.Esuli and Sebastiani, 2005
Text classification by glosses.
4.Ku, Liang, and Chen, 2006
Sentiment orientation of sentences can be
concluded from that of words
5.Pang, Lee and Vaithyanathan, 2002
Machine learning methods:Naive Bayes,
Maximum Entropy and SVM.
4
Method - Acquire sentiment words
(1/2)
Assumption:
1. Regard sentences with sentiment words as
opinion ones
2. Topic is assured to exist in these opinion
sentences
Data set of sentiment words:
WS1: NTUSD
2812 positive words and 8276 negative words
D1: Emotion classification by Bruce and Wiebe
1256 positive blog articles and 1238 negative
5
Method - Acquire sentiment words
(2/2)
D2: Blog search results of Baidu
372 names of products as queries, 24146 snippets
Manual label 1685 snippets of POS, NEG, NEU
Select adj. correlated with of the adj. in D1.
Calculate the probability of each word occurs in each
sentiment category
WS1+D1+D2:
3269 positive words and 9621 negative words
6
Method - Extracting the topics using
rules (1/4)
<ROLE_SENTI, RELA, ROLE_TOPIC>
7
Method - Extracting the topics using
rules (2/4)
1. <VOB, SIBLING, SBV>
2. <DE, GRANDPARENT-SIBLING, SBV>
8
Method - Extracting the topics using
rules (3/4)
3. <ATT, PARENT-SIBLING, SBV>
4.<HED, CHILD, VOB>
9
Method - Extracting the topics using
rules (4/4)
5. <HED, CHILD, SBV>
6. <ADV, SIBLING, VOB>
7. <ANY, NEAREST, ANYNOUN>
10
Experiments and Result(1/2)
Data:
Blog search results of Baidu
372 queries
22461 snippets unlabeled as POS, NEG, NEU
51661 sentences
Two annotators to annotate topics, and POS,
NEG, NEU
570 sentences
250 for sentiment and 320 for neutral
11
Experiments and Result(2/2)
Opinion sentence
SVM (using unigram words as the features)
Topic extraction
218 sentences are correctly extracted out of
250 opinion sentences, with the accuracy of
87.2%.
Exectly match
12
Conclusion
Proposed
a rule-based approach to
extracting topics in opinion sentences.
Employ
a syntactic parsing on sentences
and take advantage of the syntactic roles
of words and their dependency
relationships to extract the topic.
13
Future Work
Negation
words
Noise filtering method
Co-reference resolution
Enlarge current rules to cover more
situations
14
HIT LTP system
http://ir.hit.edu.cn/demo/ltp/ltp_v2.0.py
15
Thank you!
16