Movie Review Mining and Summarization
Download
Report
Transcript Movie Review Mining and Summarization
Movie Review Mining and
Summarization
Li Zhuang, Feng Jing, and Xiao-Yan Zhu
ACM CIKM 2006
Speaker: Yu-Jiun Liu
Date : 2007/01/10
Outline
Introduction
The characteristic of movie review
mining
Definition
Approach
Experiment
Introduction
Review is useful for both information
promulgators and readers.
However, many reviews are lengthy with
only few sentences expressing the
author’s opinions.
Automatically generate the summary of
reviews.
Product Review v.s. Movie Review
The characteristic of movie review
mining
The promulgators probably comment
more other movie-related elements.
The reader probably wants more.
Movie review must generate richer
summary than product review.
A multi-knowledge based approach.
Definition 1
Movie Feature
A movie feature is a movie element or a movierelated people that has been commented on.
According to IMDB, feature classes are divided
into two groups: ELEMENT and PEOPLE.
ELEMENT: OA, ST (screenplay), SE (special effects)…etc.
PEOPLE: PPR, PDR, PAC…etc.
Example: “story”, “script”, and “screenplay” belong to
ST class; “actor”, “actress”, and “supporting cast”
belong to PAC class.
Definition 2
Relevant Opinion of A Feature
The relevant opinion of a feature is a set of
words or phrases that expresses a positive
(PRO) or negative (CON) opinion on the
feature.
The polarity of a same opinion word may vary in
different domain.
Example: “predictable” is neutral in product
review; sounds negative in movie review.
Definition 3
Feature-Opinion Pair
A feature-opinion pair consists of a feature
and a relevant opinion.
An explicit F-O pair : both the feature and
the opinion appear in sentence.
Example: “The movie is excellent.”
An implicit F-O pair : the feature or the
opinion does not appear in sentence.
Example: “When I watched this film, I hoped it
ended as soon as possible.” (no opinion word)
Approach – multi-knowledge based
Keyword list generation
Build a keyword list to capture main
feature/opinion words in movie reviews.
Divide the list into two classes: features
and opinions.
Feature Keywords
The words converge.
Special parts: People Name (multi-format)
(ex: Liu Yu Jiun ; Liu Y.J. ; L. Y. Jiun … etc)
Opinion Keywords
Not only use the statistical results.
The first 100 positive/negative words are
selected as seed.
For each substantive in WordNet, search it
in WordNet for the synsets of its first two
meanings. If one of the seed words is in the
synsets, the substantive is added to the
opinion word list.
Remained opinion words with high frequency
are added as domain specific words.
Mining Explicit F-O Pairs
In a sentence, use keyword list to find all feature/opinion
words.
Use dependency grammar graph to detect the path
between each feature word and each opinion word.
Stanford Parser
(http://www-nlp.stanford.edu/software/lex-parser.shtml)
Mining Explicit F-O Pairs II
Example: “This movie is a masterpiece.”
Path: “movie (NN) – nsubj – is (VBZ) – dobj – masterpiece (NN)”
Mining Implicit F-O Pairs
This problem is difficult, so only deal with two
simple cases with opinion words appearing.
Very short sentences that appear at the beginning
or ending of a review and contain obvious opinion
words.
Ex: “Great!” “movie-great” or “film-great”
Specific mapping from opinion word to feature word.
Summary Generation
1.
2.
3.
Collect all the sentences that express opinions on
a feature class.
The semantic orientation of the relevant opinion in
each sentence is identified.
List the organized sentence as the summary.
Experiments
Performance measure
Data
Select 11 movies from the top 250 list of
IMDB.
For each movie, the first 100 reviews are
downloaded.
Totally more than 16,000 sentences and
more than 260,000 words.
Four movie fans were asked to label f-o
pairs, and give the classes of feature
word and opinion word respectively.
Results
Use 880 reviews as training data, and
220 reviews as testing data.
Results II