Gradual Adaption Model for Estimation of User Information

Download Report

Transcript Gradual Adaption Model for Estimation of User Information

Gradual Adaption Model for Estimation of
User Information Access Behavior
J. Chen, R.Y. Shtykh and Q. Jin
Graduate School of Human Sciences,
Waseda University, Japan
Background
• Why do we need information
– In leisure: search a route or map for tour
– In work: search business information
– In learning: search academic papers
• Where do we get information
– From traditional media such as books, magazines, etc.
– From Internet
• How do we get information
– To search it from a bookstore, library, etc.
– To search it from Internet
• What do we have to face in information search
– Too many search results including trashes
July 17, 2015
Waseda University
2
Information Recommendation
• Information recommendation
– Web mining approaches
•
•
•
•
Usage mining
Structure mining
Content mining
Semantic mining
– Web mining data
• Content data: text and multimedia provided by web sites.
• Structure data: organization inside a web page, internal and
external links, and the web site hierarchy.
• Usage data: access logs data of web sites.
• User profile: information data of users.
• Semantic data: the data describe the structure and definition
of semantic web sites.
July 17, 2015
Waseda University
3
Study Approach
• Proposing a gradual adaption model for
estimation of user information access behavior
• Analyzing a variety of users' information access
data in terms of short, medium, long periods,
and by remarkable and exceptional categories,
and based on Full Bayesian Estimation
• Conducting experimental simulation to show the
operability and effectiveness of the proposed
model
July 17, 2015
Waseda University
4
Related Works
• WUM (Web Usage Mining)
– Based on implicit users’ feedback
• A new document representation model (Poblete and Baeza-Yates,
WWW2008)
– Experimented on a web site with a small number of vocabularies and
specific to certain topics.
• Indentifying relevant web sites from user activities (Bilenko and White,
WWW2008)
– Needs to spend more time to train the system
– Personalize information recommendation
• Dynamic Link Generation (Yan, et al, WWW1996)
– Consists of off-line and on-line modules
• SUGGEST 3.0 (Baraglia and Silvestri, WT2004)
– For large web sites, and only have on-line module. But the size of logs
used to evaluated the system is small and limited.
• LinkSelector (Fang and Sheng, ACM2004)
– Hyperlinks-structural and theirs access logs were used.
July 17, 2015
Waseda University
5
Definitions of Keyword, Link and Concept
• Keyword
– Keywords in web pages
• Link
– Web pages’ link
• Concept
– Consists of a number of keywords and links
Concept:
Literature
Andersen
cartoon
nature
culture
Leonardo
Concept:
Art
painting
Link a
Concept:
Philosophy
July 17, 2015
Aristotle
Link b
Waseda University
6
Full Bayesian Estimation
P ( Dm 1  t |Ð)   p ( |Ð)d
(d t   t  d f   f )
d
d  1


(
1


)
(d t   t )(d f   f ) 
dt   t

dt  d f   t   f
t
t
f
 f 1
d
• Full Bayesian Estimation
• Ð is a data collection of concept
• dt is the current number of click times of a concept
• df is the current number of click times that a concept not be clicked
• αt is the history number of click times of a concept
• αf is the history number of click times that a concept not be clicked
July 17, 2015
Waseda University
7
Gradual Adaption Model
Search
Input
Click
Short
Medium
Long
Remarkable / Exceptional
Concept
Analyser
Access Logs
Concept KB
Probability
Estimator
Search
Query
Web
Documents
Estimation Base
Gradual
Adaption
Recommender
Matchin
g
On-line
July 17, 2015
Off-line
Waseda University
8
Gradual Adaption Model
• We divide users’ interests into three terms
of short, medium, long periods, and by
remarkable, exceptional categories.
• This model is an adaptive one.
– It can adapt to a transition of users’
information access behaviors.
• In the model, training is not needed, since
the model uses Full Bayesian Estimation
that has a learning function.
July 17, 2015
Waseda University
9
Gradual Adaption Model
Search
Input
Click
Short
Medium
Long
Remarkable / Exceptional
Concept
Analyser
Access Logs
Concept KB
Probability
Estimator
Search
Query
Web
Documents
Estimation Base
Gradual
Adaption
Recommender
Matchin
g
On-line
July 17, 2015
Off-line
Waseda University
10
Simulation and Evaluation
• Environment
– Java, Tomcat, MySQL, and Nekohtml
• Data
– Wikipedia on DVD Version 0.5
• more than 2000 web pages that belong to more
than 180 concepts
July 17, 2015
Waseda University
11
Simulation and Evaluation
• Short period (such as 7 days / 1 week)
– Test case
• This case is a user who has two interests, and these interests are
affected by some factors easily.
• The expectation is that there is a possibility that the probability of
the relation concept can change hugely in short or medium period,
but not in long period.
• Two concepts of “Art” and “Artists” are assumed to be used, and the
number of clicks is dynamically varying.
– Test result
short period
1.2
Probability
1
0.8
0.6
0.4
0.2
0
08
/2
/2
08 2
/2
/2
9
08
/3
/
08 7
/3
/1
08 4
/3
/2
08 1
/3
/2
8
08
/4
/
08 4
/4
/1
08 1
/4
/1
08 8
/4
/2
5
08
/5
/2
08
/5
/
08 9
/5
/1
6
• The movement of the concept’s
rate changing frequently.
• In some days, the probability of
concepts in short period is bigger
than long period.
Art
July 17, 2015
Waseda University
Artists
Philosophers
Date
Philosophical thought movements
12
Simulation and Evaluation
• Medium period (such as 30 days / 1 month)
– Test case
• This case is a user who has a temporary interest.
• The user access the concept of temporary interest sometime.
• The expectation is that this concept ought to keep a low rate in the
three periods.
• One concept “Philosophers” is assumed to be used per three days,
medium period
– Test result
08
/2
/2
08 2
/2
/2
9
08
/3
/
08 7
/3
/1
08 4
/3
/2
08 1
/3
/2
8
08
/4
/
08 4
/4
/1
08 1
/4
/1
08 8
/4
/2
5
08
/5
/2
08
/5
/
08 9
/5
/1
6
Probability
• The change is becoming smaller.
• But the probability of concepts in
short period is bigger than
medium period in some days.
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Date
Art
July 17, 2015
Waseda University
Artists
Philosophers
Philosophical thought movements
13
Simulation and Evaluation
• Long period (such as 90 days / 3 months)
– Test case
• This case is a user who has a long-term interest.
• The expectation is that the probability of the interested concept
ought to keep a high rate in long period.
• One concept “Philosophical thought movements” is assumed to be
used everyday,
– Test result
long period
0.8
0.7
Probability
• The change becomes quite stable.
• There is no big change in the
long period.
0.6
0.5
0.4
0.3
0.2
0.1
08
/2
/2
08 2
/2
/2
9
08
/3
/
08 7
/3
/1
08 4
/3
/2
08 1
/3
/2
8
08
/4
/
08 4
/4
/1
08 1
/4
/1
08 8
/4
/2
5
08
/5
/2
08
/5
/
08 9
/5
/1
6
0
Art
July 17, 2015
Waseda University
Artists
Philosophers
Date
Philosophical thought movements
14
Conclusion
• In this study, we have proposed a gradual
adaption model (GAM) for estimation of
user information access behavior.
• The three periods of GAM can correctly
distinguish long-term and temporary
interest of users even if has no system
training.
July 17, 2015
Waseda University
15
Future Works
• To set more different patterns for short,
medium and long periods to find more
reasonable ones.
• To evaluate the proposed model with
users' involvement.
• To compare our proposed approach with
other related recommendation models.
July 17, 2015
Waseda University
16
Thank you for your attention.
July 17, 2015
Waseda University
17