Capturing User Interests by Both Exploitation and Exploration

Download Report

Transcript Capturing User Interests by Both Exploitation and Exploration

Capturing User Interests by
Both Exploitation and
Exploration
Richard Sia (Joint work with NEC)
Feb 9 2007
Background

The paid experiment carried out in Fall 2006

11 participants

Personal information manager

N-armed bandit problem
Overview of the PIM




Recorder
Learner
Crawler
Recommendation Provider
N-armed bandit problem

Well-studied problem in reinforcement learning / statistics

Problem statement
 Background: You are given n different options
 Decision: For each choice you receive a numerical reward
chosen from an unknown stationary probability distribution
 Goal: maximize the total reward over some time period

Solutions
 Action-value methods (greedy & ε-greedy)
 Softmax Action Selection (decaying)
 Pursuit methods
 Associative search
Reward vs plays

ε-greedy give better reward than greedy 
exploration is important
How are they related?

You are the n-armed bandit

PIM is the player

PIM recommends you topics, you give it
reward (by clicking)
Model

Assumptions



Θi – P(click | read, topic i)


K topics (K-armed bandit)
Each page is classified into one topic
it follows a Bernoulli distribution with parameter p, we assume p
follows a beta distribution with parameters α, β
g(j) – P(read | j)

From various search result click studies
K

Utility function: U(R; Θ) =

i 1

R: ranking of topics
i
g ( ri )
Model

Updating posteriori distribution after observation



Not clicked: βnew=βold + g(ri)
Clicked: αnew=αold + 1
Ranking function of topics

Exploitation + λ*exploration

Mean + λ*variance



 
(   ) 2 (    1)
Experiments

Simulation




45 topics (assign degree of interest randomly)
Size of display list: 7
Read probability: [1 0.95 0.9 0.85 0.8 0.75 0.7]
3 strategies




E&E
Greedy
Random
User study
Utility

λ = 10
λ= 20
Utility under interest drift (non-stationary
probability distribution)

λ = 10
λ= 20
Estimation accuracy of θ

Random > E&E > greedy
Experiments

Simulation

User study

45 categories from dmoz.org







Arts/Architecture
Computers/E-books
Science/Biology
etc.
Survey of user interest before experiment
Present 7 items each time
Interleave 3 strategies randomly
Click utility improvement

First 25 iterations
Drift at 25th iteration
Conclusion

Modeling of a learning framework to infer
user’s interest from user’s clicks

Some possible future works


One page (or item) belongs to multiple classes
Dependency between classes