Sentiment Analysis

Download Report

Transcript Sentiment Analysis

ACM SIGKDD China Chapter
ACM 数据挖掘中国分会
(KDD China)
公众微信号:CKDD
加入会员
Sentiment Analysis and
Lifelong Learning
Bing Liu
University Of Illinois at Chicago
[email protected]
My Goal

Introduce sentiment analysis (SA)


Introduce lifelong machine learning (LML)




Focus on problems
Classic isolated ML has serious limitations
Lifelong learning: retain knowledge learned in the
past to help future learning & problem solving
Without it, an AI system will never be “intelligent”
Solving SA using lifelong learning.
Nankai U., 3/8/2016
3
Introduction

Sentiment analysis or opinion mining


computational study of opinion, sentiment,
appraisal, evaluation, and emotion.
Why is it important?

Opinions are key influencers of our behaviors.



Our beliefs and perceptions of reality are conditioned
on how others see the world.
Whenever we need to make a decision we often seek
out the opinions from others.
Spread from CS to management, health, finance,
medicine, political and social sciences
@Tsinghua 3/25/2016
4
Outline

Introduction to sentiment analysis





Introduction to lifelong machine learning



Problem of sentiment analysis
Document/sentence sentiment classification
Aspect-based sentiment analysis
Some other difficult problems and sentences
Lifelong learning for aspect extraction
Lifelong learning for sentiment classification
Summary
@Tsinghua 3/25/2016
5
Terms defined - Merriam-Webster

Sentiment: an attitude, thought, or judgment
prompted by feeling.



A sentiment is more of a feeling.
“I am concerned about the current state of the
economy.”
Opinion: a view, judgment, or appraisal
formed in the mind about a particular matter.


a concrete view of a person about something.
“I think the economy is not doing well.”
@Tsinghua 3/25/2016
6
Sentiment Analysis (SA) problem
(Hu and Liu 2004; Liu, 2010; 2012)

Id: John on 5-1-2008 -- “I bought an iPhone yesterday. It is such
a nice phone. The touch screen is really cool. The voice
quality is great too. It is much better than my old Blackberry. …”

Definition: An opinion is a quadruple,
(target, sentiment, holder, time)

A more practical definition:
(entity, aspect, sentiment, holder, time)


E.g., (iPhone, touch_screen, +, John, 5-1-2008)
SA goal: Given an opinion doc, mine all quintuples
@Tsinghua 3/25/2016
7
Opinion Reason
(Liu, 2015; Wang et al. 2016)



We can perform an even finer-grained
analysis of opinions.
Opinion reason: the cause of an opinion.
E.g., “This car is too small for a tall person”

Only reporting the negative sentiment for size
does not tell the whole story because


it can mean too small or too big.
We call “too small for a tall person” the reason for the
negative sentiment about size.
@Tsinghua 3/25/2016
8
Opinion summarization
(Hu and Liu, 2004)

Classic text summarization is not suitable.



Opinion summary needs to be quantitative


Opinion summary can be defined conceptually,
not dependent on how the summary is produced.
60% positive about X is very different from 90%
positive about X.
One main form of opinion summary is

Aspect-based opinion summary
@Tsinghua 3/25/2016
9
Opinion summary
(Hu and Liu, 2004)
Aspect/feature based summary
of opinions about iPhone:
Aspect: Touch screen
Positive: 212

The touch screen was really cool.

The touch screen was so easy to
use and can do amazing things.
…
Negative: 6

The screen is easily scratched.

I have a lot of difficulty in removing
finger marks from the touch screen.
…
Aspect: voice quality
…
@Tsinghua 3/25/2016
(Liu et al. 2005)

Opinion Summary of 1 phone
+
_
Voice Screen Battery size weight

Opinion comparison of 2 phones
+
_
10
Aspect-based opinion summary
@Tsinghua 3/25/2016
11
Summarization for BestBuy (Samsung)
(AddStructure.com)
Great
Price (518)
Good Sound Quality
(895)
Easy Setup (138)
Great
Remote
Speakers
(9)
Inputs (8)
Little product flaws
(8)
Volume (4)
@Tsinghua 3/25/2016
Picture Quality
(256)
Good Sound Quality (77)
Easy Setup (60)
(5)
Changing
Channels (4)
Volume (3)
12
Two main types of opinions
(Jindal and Liu 2006; Liu, 2010)

Regular opinions: Sentiment/opinion
expressions on some target entities

Direct opinions:


Indirect opinions:


“After taking the drug, my pain has gone.”
Comparative opinions: Comparisons of more
than one entity.


“The touch screen is really cool.”
E.g., “iPhone is better than Blackberry.”
We focus on regular opinions in this talk.
@Tsinghua 3/25/2016
13
Subjectivity

Subjective sentences come in many forms, e.g.,
opinions, beliefs, allegations, desires,
suspicions, speculations (Wiebe et al. 2004).

A subjective sentence may or may not contain a
positive or negative opinion/sentiment, e.g.,



“This car is great”
“I think he went home yesterday”
Objective sentences express factual statements

They can imply opinions (Liu, 2010)
@Tsinghua 3/25/2016
14
Subjective and fact-implied opinions
(Liu, 2012)

Subjective opinion: a regular / comparative
opinion given in a subjective statement, e.g.,


“Coke tastes great.”
Fact-implied opinion: A regular / comparative
opinion implied by an objective statement.


“The battery life of this phone is longer than my
previous Samsung phone.”
“We brought the mattress yesterday, and a
body impression has formed.”
@Tsinghua 3/25/2016
15
Affect, emotion, and mood



Affect: an neurophysiological state
consciously accessible as the primitive
feeling and not directed at an object.
Emotion: the indicator of affect, emotion is a
compound (rather than primitive) feeling
concerned with a specific object
Mood, like emotion, is a feeling or affective
state but it typically lasts longer than emotion
and tends to be more unfocused & diffused
@Tsinghua 3/25/2016
16
Emotion

No agreed set of basic emotions of people.

Based on Parrott (2001), people have six basic
emotions,


love, joy, surprise, anger, sadness, and fear.
Although related, emotions and opinions are
not equivalent.

Opinion: personal (+/-) view on something


Cannot say “I like”
Emotion: a mental state, inner feeling

But can say “I am angry.”
@Tsinghua 3/25/2016
17
Definition of Emotion
(Liu, 2015)

Definition (emotion): It is a quintuple,
(entity, aspect, emotion_type, feeler, time)

E.g., “I am so mad with the hotel manager
because he refused to refund my booking fee”






Entity: hotel
Aspect: manager
emotion_type: anger
feeler: I
time: unknown
The definition can also include the cause.
@Tsinghua 3/25/2016
18
Author and Reader Standpoint
(Liu, 2015)

We can look at an opinion from two
perspectives, that of



the author (opinion holder) who posted the
opinion, and that of
the reader who reads the opinion.
For example,


“This car is too small for me.”
“Google’s profits went up by 30%.”
@Tsinghua 3/25/2016
19
Outline

Introduction to sentiment analysis





Introduction to lifelong machine learning



Problem of sentiment analysis
Document/sentence sentiment classification
Aspect-based sentiment analysis
Some other difficult problems and sentences
Lifelong learning for aspect extraction
Lifelong learning for sentiment classification
Summary
@Tsinghua 3/25/2016
20
Document sentiment classification

Classify a whole opinion doc (e.g., a review)
based on the overall sentiment (Pang & Lee, 2008)

Classes: Positive, Negative (and possibly neutral)

Assumption: The doc contains opinions about
a single entity.

Reviews usually satisfy the assumption


Positive: 4/5 stars, negative: 1/2 stars
But forum discussions often do not
@Tsinghua 3/25/2016
21
Solution methods

Supervised learning: applied all kinds of
supervised learning methods, NB, SVM, DNN
(Pang et al, 2002; Dave et al 2003; Gamon, 2004; Li et al 2010; Paltoglou &
Thelwall, 2010; Xia et al. 2013; Socher et al 2013; etc)

Features: n-grams, sentiment words/phrases, POS
tags, negation, position. dependency, word embedding


IR weighting schemes
Unsupervised methods


Based on predefined patterns (Turney, 2002)
Lexicon-based methods (Taboada et al. 2011)

A list of positive and negative words with weighting and
combination rules.
@Tsinghua 3/25/2016
22
Sentence sentiment classification

Classify each sentence with 3 classes



Positive, negative, neutral
Ignore mixed sentences, e.g., “Apple is going well
in this poor economy”
Supervised learning (Wiebe et al., 2004; Wilson et al
2004, etc)


Using similar features as for documents
Lexicon based methods (Hu and Liu 2004; Kim and
Hovy, 2004)
@Tsinghua 3/25/2016
23
Outline

Introduction to sentiment analysis





Introduction to lifelong machine learning



Problem of sentiment analysis
Document/sentence sentiment classification
Aspect-based sentiment analysis
Some other difficult problems and sentences
Lifelong learning for aspect extraction
Lifelong learning for sentiment classification
Summary
@Tsinghua 3/25/2016
24
Aspect-based sentiment analysis

Document/sentence sentiment classification
does not give details (Hu and Liu, 2004).

They help but do not solve the problem of
(entity, aspect, sentiment, holder, time)



Do not identify entity, aspect, holder or time.
Do not assign sentiment to entity/aspect.
For applications, we often need to solve the
full problem, i.e., aspect-based analysis.
@Tsinghua 3/25/2016
25
Aspect sentiment classification

A sentence can have multiple aspects with
different opinions.


“I love the picture quality but not the battery life.
Almost all approaches make use of sentiment
words (“good,” “bad”) and phrases. But



Some sentiment words have context independent
orientations, e.g., “good” and “bad” (almost)
Most other words have context dependent
orientations, e.g., “sucks” (+: vacuum cleaner)
A big problem in practice
@Tsinghua 3/25/2016
26
Supervised learning

Supervised learning is tricky:


“Apple is doing very well in this lousy economy”
Some approaches


Compute the word feature weight based on the
distance between the word feature and the target
entity/aspect (Boiy and Moens, 2009)
Use a parse tree to generate a set of target
dependent features (e.g., Jiang et al. 2011)
@Tsinghua 3/25/2016
27
Lexicon-based approach

Need parsing to deal with: Simple sentences,
compound sentences, conditional sentences,
questions; different verb tenses, etc (Ding et al
2008, Narayanan et al 2009; Liu, 2015).



Negation (not), contrary (but), comparisons, etc.
A large opinion lexicon, context dependency, etc.
Easy: “Apple is doing well in this poor economy.”
wi .sen
i 1 dist (w , a)
i
n
@Tsinghua 3/25/2016
28
Aspect extraction

“The battery life is long, but pictures are poor.”


Aspects: battery life, picture
Many approaches


Frequency-based: frequent noun phases (Hu & Liu, 2004)
Syntactic dependency: opinion and target relation (Hu
& Liu 2004; Zhuang, Jin & Zhu 2006; Wang & Wang, 2008; Wu et al. 2009;
Blair-Goldensohn et al., 2008; Qiu et al. 2009, Kessler & Nicolov, 2009; etc).

Supervised sequent labeling (e.g., CRF) (Jin and Ho
2009; Jakob and Gurevych, 2010, etc)


Topic modeling (Mei et al, 07; Titov et al 08; Li, Huang & Zhu, 10, …)
Many others (Kobayashi et al. 2006; Fang & Huang 2012; Liu, Xu &
Zhao. 2013; Zhu, Wan & Xiao 2013, etc)
@Tsinghua 3/25/2016
29
Extract aspects using DP
(Qiu et al. 2009)



Double propagation (DP): Based on definition,
an opinion has a target, entity/aspect.
There are some syntactic dependency
between them.
Knowing one helps find the other.


E.g., “The rooms are spacious”
It extracts both targets and opinion words.
@Tsinghua 3/25/2016
30
Rules from dependency grammar
@Tsinghua 3/25/2016
31
Topic modeling for aspect extraction

Aspect extraction actually has two tasks:


(1) extract aspect terms,
(2) cluster them (synonym grouping).


Same aspect: “picture,” “photo,” “image”
Top modeling (Blei et al 2003) performs both tasks
at the same time. A topic is an aspect.


A document is a distribution over topics
A topic is a distribution over terms/words
@Tsinghua 3/25/2016
32
Joint model of sentiment and aspect
(Zhao et al., 2010)

A review consists of





Aspect words: picture, price
General sentiment words: good, bad, great
Aspect specific sentiment word: clear, expensive
Background words: all other words
Joint modeling of all: extraction and clustering


Max-ent is used to initially classify them.
Modeling is unsupervised.
@Tsinghua 3/25/2016
33
Graphical model (plate)

yd,s,n indicates




MaxEnt is used to
train a model
using training set



d,s,n
xd,s,n feature vector
ud,s,n indicates


@Tsinghua 3/25/2016
Background word
Aspect word, or
Opinion word
General or
Aspect-specific
34
Semi-supervised modeling (seeds)
(Mukherjee and Liu, 2012)


Unsupervised aspect-sentiment model has
difficulty in producing aspects which suit the
user need.
Semi-supervised modeling allows the user to
give some seed aspect terms for a subset of
aspects

produce aspects that meet the user’s need.
@Tsinghua 3/25/2016
35
Graphical model (plate)
@Tsinghua 3/25/2016
36
Outline

Introduction to sentiment analysis





Introduction to lifelong machine learning



Problem of sentiment analysis
Document/sentence sentiment classification
Aspect-based sentiment analysis
Some other difficult problems and sentences
Lifelong learning for aspect extraction
Lifelong learning for sentiment classification
Summary
@Tsinghua 3/25/2016
37
Explicit and implicit aspects
(Hu and Liu, 2004)

Explicit aspects: Aspects explicitly mentioned
as nouns or noun phrases in a sentence


“The picture quality is of this phone is great.”
Implicit aspects: Aspects not explicitly
mentioned but implied



“This car is so expensive.” => price
“Included 16MB is stingy.” => memory
“This phone will not fit in a pocket.” => size
@Tsinghua 3/25/2016
38
Fact-implied opinions

Very hard to deal with such cases.


Every domain has some such sentences.
These sentences are from paint reviews.




“For paintX, one coat can cover the wood color.”
“For paintY, we need three coats to cover the wood color”
We know that paintX is good and paintY is not, but
how, by a system.
This is from a mattress review:

“We brought the mattress yesterday, and a body
impression has formed.”
 3/25/2016
@Tsinghua
39
Conditional & interrogative sentences

Conditional sentences are hard to deal with
(Narayanan et al. 2009)


“If I can find a good camera, I will buy it.”
But conditional sentences can have opinions


“If you are looking for a good phone, buy Nokia”
Questions are also hard to handle


“Are there any great perks for employees?”
“Any idea how to fix this lousy Sony camera?”
@Tsinghua 3/25/2016
40
Opinions in sarcastic sentences

Sarcastic sentences




Sarcastic sentences are common in political
blogs, comments and discussions.


“What a great car, it stopped working in the
second day.”
“Great for insomniacs” (book)
“Can he read?” (Obama)
They make political opinions difficult to handle
Some initial work by (Tsur, et al. 2010, Riloff et al., 2013)
@Tsinghua 3/25/2016
41
Coreference resolution


Little work has been done.
Some semantic analysis needed (Ding and Liu, 2010)

“This Sharp tv’s picture quality is so bad. Our old
Sony tv is much better. It is also so expensive.”


“This Sharp tv’s picture quality is so bad. Our old
Sony tv is much better. It is also more reliable.”


“it” means “Sharp”
“it” means “Sony”
Sentiment consistency.
@Tsinghua 3/25/2016
42
Some other interesting sentences

“Trying out Chrome because Firefox keeps crashing.”
 Firefox - negative; no opinion about chrome.
 We need to segment the sentence into clauses to
decide that “crashing” only applies to Firefox(?).

But how about these



“I changed to Audi because BMW is so expensive.”
“The laptop next to Lenovo looks so ugly.”
“I am so happy that my iPhone is nothing like my
old ugly Droid.”
@Tsinghua 3/25/2016
43
Some interesting sentences (contd)

See these two sentences in a medical domain:



“I come to see my doctor because of severe
stomach pain”
“After taking the drug, I got severe stomach pain”
The first sentence has no opinion about the
drug or doctor, but the second implies negative
opinion about the drug.

Some understanding is needed
@Tsinghua 3/25/2016
44
Some more interesting sentences

“The top of the picture was brighter than the bottom.”

“When I first got the airbed a couple of weeks ago it
was wonderful as all new things are, however as the
weeks progressed I liked it less and less.”

“My goal is to get a TV with good picture quality”

“Google steals ideas from Bing, Bing steals market
shares from Google.”
@Tsinghua 3/25/2016
45
Sentiment analysis is hard!

“This past Saturday, I bought a Nokia phone
and my girlfriend bought a Motorola phone
with Bluetooth. We called each other when we
got home. The voice on my phone was not so
clear, worse than my previous Samsung
phone. The battery life was short too. My
girlfriend was quite happy with her phone. I
wanted a phone with good sound quality. So
my purchase was a real disappointment. I
returned the phone yesterday.”
@Tsinghua 3/25/2016
46
Outline

Introduction to sentiment analysis





Introduction to lifelong machine learning



Problem of sentiment analysis
Document/sentence sentiment classification
Aspect-based sentiment analysis
Some other difficult problems and sentences
Lifelong learning for aspect extraction
Lifelong learning for sentiment classification
Summary
@Tsinghua 3/25/2016
47
ML 1.0:

Classic ML: learning in isolation

run a ML algorithm on a data to learn a function


without considering any related information or the past
learned knowledge.
Existing ML algorithms such as Deep NN,
SVM, NB, CRF, topic models, etc.



classic machine learning
have been very successful.
Clearly, they can still be improved
But the isolated learning paradigm has
fundamental limitations.
@Tsinghua 3/25/2016
48
Knowledge is not cumulative

No memory:



Knowledge learned is not retained.
Then, it cannot learn by leveraging the past
learned knowledge
Humans always seem to learn new things
based on what we already know.

Whenever we see a new situation, a large part of
it is known to us

Nothing is completely new.
@Tsinghua 3/25/2016
49
Need large training data

Due to the lack of prior knowledge


Needs a large number of training examples.
Humans can learn effectively from a few
examples.

When did anyone ever give you 1000 positive
docs and 1000 negative docs, and
 ask you to manually learn a classifier to
classify new documents?
@Tsinghua 3/25/2016
50
Learning is not compositional


It does not learn basic components and how
they work together to give the full meaning.
This kind of learning is crucial, e.g., for NLP
and image processing,

The learned knowledge can easily transfer across
domains and tasks because


any sentence or document is made up of individual
words and phrases following syntactic rules.
Without such bottom-up learning, impossible to

handle the infinite number of possible sentences.
Nankai U., 3/8/2016
51
No Self-Learning


Need externally provided training data.
Humans in most cases learn by themselves




Find our own training data (only a few)
Even set up experiments to gather data
It is all done based on our existing knowledge
and our reasoning capability.
But can a computer system performs


simple self-learning continuously, and
become more and more knowledgeable?
Nankai U., 3/8/2016
52
ML 2.0: lifelong learning
(Thrun, 1996; Silver et al 2013; Chen and Liu, 2014)

Learn as humans do.


lifelong machine learning (LML)
Retain learned knowledge from previous tasks &
use it to help future learning

Let us call this paradigm Machine Learning 2.0

Big data provides a great opportunity for LML


E.g., big text data
Extensive sharing of concepts across tasks/domains
due to the nature of the natural language
@Tsinghua 3/25/2016
53
Lifelong Machine Learning (LML)
(Chen and Liu 2014, Chen, Ma and Liu, 2015)
Definition: LML is a continuous learning process
where the learner performs a sequence of learning
tasks T1, T2, …, Tn.



When faced with a new task Tn+1 with data Dn+1, the
learner makes use of prior knowledge K in its
knowledge base (KB) to help learn Tn+1.
KB contains knowledge accumulated from the past
learning of the T1, T2, …, Tn tasks.
KB is updated with the learned (intermediate as well
a final) results from Tn+1.
@Tsinghua 3/25/2016
54
Related: Transfer Learning


Transfer learning has been studied
extensively (survey by Pan & Yang, 2010).
Problem statement:

Source domain(s) (usually 1 source domain/task)


Target domain (assume to be related)


With labeled training data
With little or no labeled training data but unlabeled data
Goal: leverage the information from the source
domain(s) to help learning in the target domain

Only optimize the target domain/task learning
@Tsinghua 3/25/2016
55
Related: Multitask Learning (MTL)

Problem statement: Co-learn multiple related
tasks simultaneously:



All tasks have labeled data and are treated equally
Goal: optimize learning/performance across all
tasks through shared knowledge
Rationale: introduce inductive bias in the joint
hypothesis space of all tasks (Caruana, 1997)


by exploiting the task relatedness structure, or
shared knowledge
@Tsinghua 3/25/2016
56
Transfer, Multitask  Lifelong


Lifelong extends transfer learning & multitask
learning with knowledge accumulation.
Different from transfer learning



Transfer learning is not continuous
No retention of knowledge
Different from multitask learning


No retention of knowledge except data
Hard to re-learn everything when faced a new task
@Tsinghua 3/25/2016
57
Outline

Introduction to sentiment analysis





Introduction to lifelong machine learning



Problem of sentiment analysis
Document/sentence sentiment classification
Aspect-based sentiment analysis
Some other difficult problems and sentences
Lifelong learning for aspect extraction
Lifelong learning for sentiment classification
Summary
@Tsinghua 3/25/2016
58
Aspect Extraction: Topic Modeling

“The battery life is long, but pictures are poor.”


Aspect terms: battery life, picture
Aspect extraction actually has two tasks:
(1) extract aspect terms

“picture,” “photo,” “battery,” “power”
(2) cluster them (synonym grouping).


Same aspects: {“picture,” “photo”}, {“battery,” “power”}
Top modeling (Blei et al 2003) performs both
tasks at the same time. A topic is an aspect.

E.g., {price, cost, cheap, expensive, …}
@Tsinghua 3/25/2016
59
Key observation in practice
(Chen and Liu, ICML-2014)

A fair amount of aspect overlapping across
reviews of different products or domains




Every product review domain has the aspect price,
Most electronic products share the aspect battery
Many also share the aspect of screen.
This sharing of concepts / knowledge across
domains is true in general, not just for SA.

It is “silly” not to exploit such sharing in learning
@Tsinghua 3/25/2016
60
Big data and aspect sharing

Why using SA for lifelong learning?



Online reviews: Excellent data with extensive
sharing of aspect/concepts across domains
Hard to find suitable data in other appl. areas
Why big (and diverse) data?

Learn a broad range of reliable knowledge. More
knowledge makes future learning easier.
@Tsinghua 3/25/2016
61
Lifelong Topic Modeling (LTM)
(Chen and Liu, ICML-2014)

For aspect extraction:

Top modeling (Blei et al 2003) find topics from a
collection of documents. Topics are aspects.


A document is a distribution over topics
A topic is a distribution over terms/words, e.g.,


{price, cost, cheap, expensive, …}
Questions:

How to find good past knowledge and how to use
them to help future extraction?
@Tsinghua 3/25/2016
62
What Knowledge?
(Chen and Liu, 2014a, 2014b; Wang et al. 2016)

Should be in the same aspect/topic
=> Must-Links
e.g., {picture, photo}

Should not be in the same aspect/topic
=> Cannot-Links
e.g., {battery, picture}
@Tsinghua 3/25/2016
63
Lifelong Topic Modeling (LTM)
(Chen and Liu, ICML-2014)

Must-links are mined dynamically.
@Tsinghua 3/25/2016
64
LTM model


Step 1: Runs a topic model (e.g., LDA) on each
𝐷𝑖 ∈ 𝐷 to produce a set of topics 𝑆𝑖 called p-topics.
Step 2: (1) Mine prior knowledge (must-links) (2)
use prior knowledge to guide modeling.

@Tsinghua 3/25/2016
65
Knowledge Mining


Topic match: find similar topics (𝑀𝑗𝑡∗ ) from p-topics
for each current topic
Pattern mining: find frequent itemsets from 𝑀𝑗𝑡∗
@Tsinghua 3/25/2016
66
An Example

Given a newly discovered topic:

{price, book, cost, seller, money},
We find 3 matching topics from topic base S




Domain 1: {price, color, cost, life, picture}
Domain 2: {cost, screen, price, expensive, voice}
Domain 3: {price, money, customer, service, expensive}
If we require words appear in at least two
domains, we get two must-links (knowledge):


{price, cost} and {price, expensive}.
Each set is likely to belong to the same aspect/topic.
@Tsinghua 3/25/2016
67
Model Inference: Gibbs Sampling

How to use the must-links knowledge?



Graphical model: same as LDA
But the model inference is very different


e.g., {price, cost} & {price, expensive}
Generalized Pólya Urn Model (GPU)
Idea: When assigning a topic t to a word w,
also assign a fraction of t to words in mustlinks sharing with w.
@Tsinghua 3/25/2016
68
Simple Pólya Urn model (SPU)
Generalized Pólya Urn model (GPU)
…
Gibbs Sampler for GPU

𝑃 𝑧𝑖 = 𝑡 𝒛−𝑖 , 𝒘, 𝛼, 𝛽 ∝
−𝑖
𝑛𝑚,𝑡
+𝛼
−𝑖
𝑇
𝑛
′
𝑡 =1 𝑚,𝑡 ′ +
@Tsinghua 3/25/2016
𝛼
×
𝑉
−𝑖
′
𝔸
×
𝑛
𝑤 ′ =1 𝑤 ,𝑤𝑖
𝑡,𝑤 ′ +𝛽
𝑉
−𝑖
𝑉
′
𝔸
×
𝑛
′
𝑣=1
𝑤 =1 𝑤 ,𝑣
𝑡,𝑤 ′ +𝛽
71
Experiment Results
@Tsinghua 3/25/2016
72
AMC: Modeling with Small Datasets
(Chen and Liu, KDD-2014)

The LTM model is not sufficient when the
data is small for each task because


It cannot produce good initial topics for matching
to identify relevant past topics.
AMC mines must-links differently

Mine must-links from the knowledge (topic) base
without considering the task/data


Task/domain independent.
Using FIM to mine from all past topics.
@Tsinghua 3/25/2016
73
Cannot-Links

Now need to mine cannot-links, which is
tricky because

There is a huge number of cannot-links O(V2)


V is the vocabulary size
Thus need to focus on only those terms that
are relevant to target data Dt.

That is, we need to embed the process of finding
cannot-links in the sampling
@Tsinghua 3/25/2016
74
Overall Algorithm

Sampling becomes much more complex

The paper proposed M-GPU model (multigeneralized Polya urn model)
@Tsinghua 3/25/2016
75
Lifelong Topic Modeling – AMC
Must-links are mined offline and cannot-links
are mined online dynamically.
@Tsinghua 3/25/2016
76
Word vector and aspect associations
(Liu et al, AAAI-2016)


For improving aspect extraction.
Using syntactic dependency method DP (Qiu et al
2011) as the base to produce aspect sets



R1: using a set of strict dependencies for high precision
R2: using a large set of dependencies for high recall.
Based on past knowledge to pick candidate
aspects from R2 and add to R1.


Recommend using word vectors trained from past data
Recommend using aspect associations of past aspects
@Tsinghua 3/25/2016
77
Outline

Introduction to sentiment analysis





Introduction to lifelong machine learning



Problem of sentiment analysis
Document/sentence sentiment classification
Aspect-based sentiment analysis
Some other difficult problems and sentences
Lifelong learning for aspect extraction
Lifelong learning for sentiment classification
Summary
@Tsinghua 3/25/2016
78
Lifelong Sentiment Classification
(Chen, Ma and Liu, 2015)

“I bought an iPhone a few days ago. It is such a
nice phone. The touch screen is really cool. The
voice quality is great too. ....”

Goal: classify docs or sentences as + or -.

Need to manually label a lot of training data for
each domain, which is highly labor-intensive

Can we not label for every domain or at least
not so many docs/sentences?
@Tsinghua 3/25/2016
79
Exploiting Past Information/Data

But it is “well-known” that a sentiment classifier
(SC) built for domain A will not work for domain B.


Classic solution: transfer learning



E.g., SC built for “camera” will not work for “earphone”
Using labeled data in a past domain S (camera) to
help learning in the target domain T (earphone).
If S and T are very similar, S can help.
Transfer learning may not be the best solution!
@Tsinghua 3/25/2016
80
Lifelong Supervised Learning
Imagining - we have worked on a large number
of past domains with their training data D.


do we need any data from the new domain T?
No, in many cases –

A naive lifelong learning method works wonders.


Improve accuracy by as much as 19% (= 80%-61%)
Yes, in some others: e.g., we build a SC
using D, but it works poorly for toy reviews.

Why?
@Tsinghua 3/25/2016
Because of “toy”
81
Exploiting Knowledge via Penalties

Domain dependent sentiment words

Domain-level knowledge: If a word appears in
one/two past domains, the knowledge associated
with it is probably not reliable or general.
@Tsinghua 3/25/2016
82
One Result
@Tsinghua 3/25/2016
83
Outline

Introduction to sentiment analysis





Introduction to lifelong machine learning



Problem of sentiment analysis
Document/sentence sentiment classification
Aspect-based sentiment analysis
Some other difficult problems and sentences
Lifelong learning for aspect extraction
Lifelong learning for sentiment classification
Summary
@Tsinghua 3/25/2016
84
Summary

SA is a well-defined semantic analysis prob.

Two key concepts form its core


SA is still extremely challenging


(1) sentiment and (2) sentiment target or aspect
Novel ideas are needed
Lifelong machine learning (LML) is promising

Discussed solving SA problems using LML

Key observation: SA has a significant amount of sharing of
sentiment expressions and aspects

This should be true in general for NLP.
@Tsinghua 3/25/2016
85
More Information

B. Liu. Sentiment Analysis and Opinion Mining.
Morgan & Claypool Publ. 2012.

B. Liu. Sentiment Analysis: Mining Opinions,
Sentiments, and Emotions. Cambridge
University Press, 2015.

My sentiment analysis page


www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
My lifelong machine learning page

www.cs.uic.edu/~liub/lifelong-learning.html
@Tsinghua 3/25/2016
86