a, p - SOSELab

Download Report

Transcript a, p - SOSELab

Towards Personalized
Context-Aware Recommendation
by Mining Context Logs
through Topic Models
KUIFEI YU, BAOXIAN ZHANG, HENGSHU ZHU,HUANHUAN CAO, AND JILEI TIAN
SPRINGER-VERLAG BERLIN HEIDELBERG 2012
1
Goal
To mine common context-aware preferences (CCPs) from many users’ context
logs through topic models and represent each user’s personal context-aware
preferences as a distribution of the mined common context-aware preferences.
2
User Context Logs
A real-world data set of context logs collected from 443 mobile phone users
spanning for several months, which contains more than 8.8 million context
records, 665 different interactions (activities) in 12 content categories.
3
Related Work
Only leverage individual user’s historical context data preferences
Do not take into account the problem of insufficient personal training data
◦ A personalized recommender system to recommend travel related information
◦ Location-based personalized recommender system by Bayesian Networks
◦ Points-of-interest (POI) for users in an automotive scenario by leveraging a Multi-Criteria Decision Making (MCDM)
Based on rating logs of mobile users and the objective is to predict accurate ratings for the
unobserved items under different contexts
◦ Collaborative Filtering (CF) based approaches
◦ leverage a classification rule of decision tree to understand users’ personal preference
◦ The approach can model user, location and activity as a 3-dimensional matrix, namely tensor
◦ modeled the rich contextual information with item by N-dimensional tensor, and proposed a novel algorithm to make tensor factorization
It’s easier to collect context logs which contain users’ historical context data and activity
records than rating data in user mobile devices.
4
Preliminary
Capture the historical context data and corresponding activity records as contextual feature-value pairs:
◦ contextual features (e.g., Day name, Time range, and Location)
◦ corresponding values (e.g., Saturday, AM8:00-9:00, and Home)
Transform raw location based context data such as GPS coordinates or cell Ids into social locations by some existing
location mining approaches :
◦ “Home” and “Work Place”
Transform raw activity records by mapping the activity of using a particular application:
◦ Transform two raw activity records “Play Angry Birds” and “Play Fruit Ninja” to same activity records “Play action games”
User u with C prefers activity a :
5
Example of activity records
“Play action games”
ACP-Feature1
ACP-Feature2
ACP-Feature3
n
User1
Play action game
Saturday, evening, Home
Play video
Saturday, evening, Home
Use facebook
Monday, morning, WorkPlace
…
User2
Play action game
Sunday, evening, By the way
Plan timetable
Saturday, evening, Home
Play song list
Monday, afternoon, WorkPlace
…
P(z|u)
Games Topic (z1) Business Topic (z2) Music Topic (z3)
Social Topic (z4)
Others…
User1
38/56
3/56
5/56
8/56
…
User2
36/125
24/125
12/125
15/125
…
z1
z2
z3
z4
Others…
Games- Saturday, evening, Home
0.78
0.12
0.22
0.09
…
Games - Sunday, evening, By the way
0.77
0.11
0.17
0.03
…
0.25
0.23
0.21
0.53
…
…
P(a,p|z)
Social network service - Monday, morning, WorkPlace
6
LDA (Latent Dirichlet Allocation)
Suppose you have the following set of sentences:
◦
◦
◦
◦
◦
I like to eat broccoli and bananas.
I ate a banana and spinach smoothie for breakfast.
Chinchillas and kittens are cute.
My sister adopted a kitten yesterday.
Look at this cute hamster munching on a piece of broccoli.
LDA is a way of automatically discovering topics that these sentences contain.
For example, given these sentences and asked for 2 topics, LDA might produce something like
Sentences 1 and 2: 100% Topic A
Sentences 3 and 4: 100% Topic B
Sentence 5: 60% Topic A, 40% Topic B
Topic A: 30% broccoli, 15% bananas, 10% breakfast, 10% munching, … (at which point, you could interpret topic A to be about food)
Topic B: 20% chinchillas, 20% kittens, 20% cute, 15% hamster, … (at which point, you could interpret topic B to be about cute animals)
7
Explanation of LDA
Corpus D : di = {w1, w2, …, wn} which is called bag of words. It is also the input of LDA.
VOC = {w1, w2, …, wm} is a set of D and each word in VOC is different.
The output vectors of LDA :
α = each di’s probability distribution of z-th topics θd = <pt1, pt2, …, ptz>
β = each topic’s probability distribution of word φz = <pw1, pw2, …, pwm>
Core idea of LDA : p(w|d) = p(w|t)*p(t|d) initial θd and φz at first
the word’s probability of topic z in di = pwm* ptz
according to this probability, we can update the word in topic z
finally, we can get a convergent result
8
Mining Common Context-Aware
Preferences through Topic Models
(Play action game, Sunday)
di
User1
User2
User3
…
(Play action game, evening)
(Play action game, Home)
…
Corpus D
(a, pl) = bag of words
<T1, C={Sunday, evening, Home …}, play action game>
<T2, C={Monday, morning, Work place …}, play facebook>
<T3, C={Saturday, morning, Home …}, browse web news>
Topic Z
…
Cpp1
Cpp2
Cpp3
(a,p),(a,p)…
(a,p),(a,p)…
(a,p),(a,p)…
…
(a,p),(a,p)…
<Ti, C={p1, p2, p3, …,pl}, a>
9
PCR-LDA
The estimated values for two distributions {p(a, p|z)} and {p(z|u)}:
the number of times ACP-feature (a, p) has been assigned to CCP z + β
p(a, p|z) = -----------------------------------------------------------------------------------------------------------------all the ACP-feature (a, p) in CCP z+ the number of ACP-features from u’s context log * β
the number of times a ACP-feature from user u’s context log that has been assigned to CCP z + α
p(z|u) = --------------------------------------------------------------------------------------------------------------------------all the ACP-feature (a, p) in user u’s context log + the number of CCPs * α
10
Experiments
Static Data Set
Utilize 10 activity categories :
◦ Web, Multimedia, Management, Games, System, Navigation, Business, Reference, Social Network Service (SNS), Utility
Contain 618 activities appear in total 408,299 activity-context records
11
Experiments
benchmark methods
CPR (Context-aware Popularity based Recommendation):
◦ predict user preferred activities by the most frequent activities
appear under C according to all users’ historical context logs
PCR-i (Personalized Context-aware Recommendation
by only leveraging Individual user’s context logs):
◦ rank each activity a by probability =
ACP-feature (a, p) / all ACP-features in the context log of u
MAP@K:
◦ Mean Average Precision at top K recommendation
MAR@K:
◦ Mean Average Recall at top K recommendation
12
Case Study
13
User1
User2
…
User151
User152
…
user446
User151
TopicZ1 (media)
TopicZ2 (internet)
(Watch Movie, Monday)
(Watch Movie, Monday) - 0.71%
(Watch Movie, Monday) - 0.21%
(Watch Movie, PM18:00-19:00)
(Watch Movie, Saturday) - 2.55%
(Watch Movie, Saturday) – 1.24%
(Watch Movie, Home)
(Watch Movie, Sunday) - 2.74%
(Watch Movie, Sunday) – 2.11%
(Play game, Monday)
(Take picture, Monday) - 1.23%
(Browse website, Monday) – 6.12%
(Play game, PM20:00-21:00)
…
…
(Play game, Home)
(Watch Movie, Am:0:00-1:00) - 0.81%
(Watch Movie, Am:0:00-1:00) – 0.03%
(Browse website, Monday)
(Watch Movie, PM18:00-19:00) - 0.44%
(Watch Movie, PM18:00-19:00) – 0.14%
(Browse website, PM21:00-22:00)
(Watch Movie, PM22:00-23:00) - 0.97%
(Watch Movie, PM22:00-23:00) - 0.92%
(Browse website, Home)
(Take picture, PM16:00-17:00) - 1.43%
(Browse website, PM16:00-17:00) – 4.14%
(Set timetable, Tuesday)
(Take picture, PM17:00-18:00) - 1.67%
(Browse website, PM21:00-22:00) – 5.56%
(Set timetable, AM6:00-7:00)
…
…
(Set timetable, Home)
(Watch Movie, Home) - 4.71%
(Watch Movie, Home) - 0.57%
(Use google map, Tuesday)
(Watch Movie, On the way) - 2.47%
(Watch Movie, On the way) – 0.01%
(Use google map, AM7:00-8:00)
(Watch Movie, Work place) - 1.24%
(Watch Movie, Work place) – 0.11%
(Use google map, On the way)
(Take picture, On the way) - 3.67%
(Browse website, Home) – 3.19%
…
…
…
Training
14
……
User152
(Watch Movie, Monday)
(Watch Movie, PM21:00-22:00)
(Watch Movie, Home)
(Listen music, Monday)
(Listen music, PM22:00-23:00)
(Listen music,, Home)
(Browse website, , Monday)
p(a,p|u) = p(a,p|z)*p(z|u)
(Browse website, PM22:00-23:00)
(Browse website, Home)
(Play online game, Tuesday)
The activity multimedia that user152 prefer is concerned by:
(Play online game, AM9:00-10:00)
(Play online game, Home)
(Play action game, Tuesday)
(Play action game, AM10:00-11:00)
Probability of (a,p) appear in topic z1
Probability of topic z1’s distribution in user152
(Play action game, On the way)
…
Testing
15