Transcript ppt
Deeper Sentiment Analysis
Using Machine Translation Technology
Kanauama Hiroshi, Nasukawa Tetsuya
Tokyo Research Laboratory, IBM Japan
Coling 2004
abstract
This paper proposes a new paradigm for
sentiment analysis : translation from text
documents to a set of sentiment units.
Making use of an existing transfer-based
machine translation engine.
introduction
Sentiment analysis (SA) is a task to obtain
someone’s feelings as expressed in positive or
negative comments (favorable or unfavorable),
questions, and requests.
SA is becoming a useful tool for the commercial
activities.
This paper describes a method to extract a set
of sentiment units from sentences, which is the
key component of SA.
introduction
A sentiment unit is a tuple of a sentiment, a
predicate, and its arguments.
It has excellent lens, but the price is too high. I don’t
think the quality of the recharger has any problem.
[favorable] excellent (lens)
[unfavorable] high (price)
[favorable] problematic+neg (recharger)
Three sentiment units indicate that the camera has good
features in its lens and recharger, and a bad feature in its
price.
The extraction of these sentiment units is not a trivial
task because many syntactic and semantic operations are
required.
A sentiment unit should be constructed as the smallest
possible informative unit so that it is easy to handle for
the organizing processes after extraction.
introduction
Implemented an accurate sentiment analyzer by
making use of an existing transfer-based machine
translation engine (Watanabe, 1992), replacing the
translation patterns and bilingual lexicons with
sentiment patterns and a sentiment polarity
lexicon.
Use deep analysis techniques
such as those used for
machine translation
where all of the syntactic
and semantic phenomena
must be handled.
introduction
our SA system attaches importance to each
individual sentiment expression, rather than to
the quantitative tendencies of reputation.
Sentiment Unit
A predicate is a word, typically a verb or an
adjective, which conveys the main notion of the
sentiment unit.
An argument is also a word, typically a noun,
which modifies the predicate with a case
postpositional in Japanese. They roughly
correspond to a subject and an object of the
predicate in English.
For example, the sentence,”ABC123 has an
excellent lens”. [fav] excellent <ABC123, lens>
Sentiment Unit
Semantically similar representations should be
aggregated to organize extracted sentiments.
Predicates may have features, such as negation,
facility, difficulty, etc.
“ABC123 doesn’t have an excellent lens.”
[unf] excellent + neg <ABC123, lens>
Easy to break. [unf] break + facil
Difficult to learn [unf] learn + diff
The surface string is the corresponding part in
the original text. It is used for reference in the
view of the output of SA.
Implementation :Transfer-based
Machine Translation Engine
the transfer-based machine translation
system consists of three parts:
a source language syntactic parser,
a bilingual transfer which handles the syntactic
tree structures,
a target language generator.
Implementation
Techniques Required for
Sentiment Analysis
Full syntactic parsing plays an important role to extract
sentiments correctly, because only by a shallow parser are
not always reliable. For example, expressions such as “I
don’t think X is good”, is not favorable opinions about X,
even though “X is good” appears on the surface. Therefore
we use top-down pattern matching on the tree structures
from the full parsing in order to find each sentiment
fragment.
In our method, initially the top node is examined to see
whether or not the node and its combination of children
nodes match with one of the patterns in the pattern
repository. In this top-down manner, the nodes “don’t think”
in the above examples are examined before “X is good
Techniques Required for
Sentiment Analysis
There are three types of patterns:
principal patterns,
The pattern converts a Japanese expression “ noun
ga warui” to a sentiment unit “[unf] bad <noun>”.
The pattern converts an expression “ noun wo ki-ni
iru” to a sentiment unit “[fav] like <noun>”
Techniques Required for
Sentiment Analysis
auxiliary patterns
expands the scope of matching.
The pattern matches with phrases such as “X-wa yoito omowa-nai. (I don’t think X is good.)” and
produces a sentiment unit with the negation feature.
When this pattern is attached to a principal pattern,
its favorability is inverted.
nominal patterns
Using this pattern, convert a noun phrase “renzu-no
shitsu (quality of the lens)” into just “lens”.
EX: The quality of the lens is good.
[fav] good <lens>
?[fav] good <quality>
Pattern used for compound nouns such as “junden
jikan (researching time). A sentiment unit “long
<time>” is not informative, but “long <recharging
time> “ can be regarded as a [unf]sentiment.
Disambiguation of
sentiment polarity
Some adjectives and verbs may be used
for both favorable and unfavorable
predicates. This variation of sentiment
polarity can be disambiguated naturally in
the same manner as the word sense
disambiguation in machine translation.
The resolution is high fav
ABC123 is expensive unf
The semantic category assigned to a noun
holds the information used for this type of
disambiguation.
Resources
Principal patterns : verbal and adjectival, and
assigned a sentiment polarity to each word. (total
3752 words)
Auxiliary/Nominal patterns: 95 auxiliary patterns
and 36 nominal patterns were created manually.
Polarity lexicon: Some nouns were assigned
sentiment polarity, e.g. [unf] for ‘noise’. (There
are many ...)”.
Some patterns and lexicons are domain
dependent. Fortunately the translation engine
used here has a function to selectively use
domain-dependent dictionaries, and thus we can
prepare patterns which are especially suited for
the domain of digital cameras.
Evaluation
Bulletin boards on the WWW that are
discussing digital cameras.
A total of 200 randomly selected
sentences were analyzed by our system.
The resources were created by looking at
other part of the same domain texts.
Experiment 1
See the reliability of the extracted sentiment
polarity, use 3 metrics: Weak / Strong
Precision, Recall
Using 2 method
(a) based on machine translation engine
(b) the lexicon-only method, which emulates the shallow
parsing approach.
Use simple polarity lexicon of adjectives and verbs.
No disambiguation was done.
Direct negation of and adjective or verb.
Experiment 1
The MT method outputs a sentiment unit only when the
expression is reachable from the root node of the syntactic
tree through the combination of sentiment fragments, while
the lexicon-only method picks up sentiment units from any
node in the syntactic tree.
The sentence is an example where the lexicon-only method
output the wrong sentiment unit , while the MT method did
not output this sentiment unit
gashitsu-ga kirei-da-to iu hyouka-ha uke-masen-deshi-ta.
‘There was no opinion that the picture was sharp.’
[fav] clear <picture> In the lexicon-only method,
some errors occurred due to the ambiguity in sentiment
polarity of an adjective or a verb, e.g. Capabilities are high.”
since high/expensive is always assigned the [unf] feature.
Experiment 2
Compare the scope of the extracted sentiment
units between MT and (c): a method that support
only naïve predicate-argument structures and
doesn’t use nominal patterns.
The output by the MT was less redundant and
more informative than Naïve method.
Ex: It seems the function was enhanced last may
(A) [fav] enhance <function, May>
(C) [fav] enhance <function>
Ex: A zoom is more desirable.
(A) [fav] desirable <hou>
(C) [fav] desirable <zoom>
conclusion
We have shown that the deep syntactic and
semantic analysis makes possible the reliable
extraction of sentiment units, and the outlining of
sentiments became useful because of the
aggregation of the variations in expressions, and
the informative outputs of the arguments.
when we regard the extraction of sentiment units
as a kind of translation. Many techniques which
have been studied for the purpose of machine
translation, such as word sense disambiguation,
anaphora resolution, can accelerate the further
enhancement of sentiment analysis.