Capturing User Interests by Both Exploitation and Exploration
Download
Report
Transcript Capturing User Interests by Both Exploitation and Exploration
Capturing User Interests by
Both Exploitation and
Exploration
Richard Sia (Joint work with NEC)
Feb 9 2007
Background
The paid experiment carried out in Fall 2006
11 participants
Personal information manager
N-armed bandit problem
Overview of the PIM
Recorder
Learner
Crawler
Recommendation Provider
N-armed bandit problem
Well-studied problem in reinforcement learning / statistics
Problem statement
Background: You are given n different options
Decision: For each choice you receive a numerical reward
chosen from an unknown stationary probability distribution
Goal: maximize the total reward over some time period
Solutions
Action-value methods (greedy & ε-greedy)
Softmax Action Selection (decaying)
Pursuit methods
Associative search
Reward vs plays
ε-greedy give better reward than greedy
exploration is important
How are they related?
You are the n-armed bandit
PIM is the player
PIM recommends you topics, you give it
reward (by clicking)
Model
Assumptions
Θi – P(click | read, topic i)
K topics (K-armed bandit)
Each page is classified into one topic
it follows a Bernoulli distribution with parameter p, we assume p
follows a beta distribution with parameters α, β
g(j) – P(read | j)
From various search result click studies
K
Utility function: U(R; Θ) =
i 1
R: ranking of topics
i
g ( ri )
Model
Updating posteriori distribution after observation
Not clicked: βnew=βold + g(ri)
Clicked: αnew=αold + 1
Ranking function of topics
Exploitation + λ*exploration
Mean + λ*variance
( ) 2 ( 1)
Experiments
Simulation
45 topics (assign degree of interest randomly)
Size of display list: 7
Read probability: [1 0.95 0.9 0.85 0.8 0.75 0.7]
3 strategies
E&E
Greedy
Random
User study
Utility
λ = 10
λ= 20
Utility under interest drift (non-stationary
probability distribution)
λ = 10
λ= 20
Estimation accuracy of θ
Random > E&E > greedy
Experiments
Simulation
User study
45 categories from dmoz.org
Arts/Architecture
Computers/E-books
Science/Biology
etc.
Survey of user interest before experiment
Present 7 items each time
Interleave 3 strategies randomly
Click utility improvement
First 25 iterations
Drift at 25th iteration
Conclusion
Modeling of a learning framework to infer
user’s interest from user’s clicks
Some possible future works
One page (or item) belongs to multiple classes
Dependency between classes