Transcript ppt

Discovery of Aggregate Usage Profiles for
Web Personalization
WebKDD 2000
Bamshad Mobasher, Honghua Dai, Tao Luo, Miki
Nakagawa, Yuqing Sun, Jim Wiltshire
System Architecture
Data Abstractions
• Drafts from W3C Web Characterization Activity(WCA)
TERM
DEFINITION
user
A single individual that is accessing file from one or more Web servers
through a browser
Every file that contributes to the display on a user’s browser at one time. It
is usually associated with a single user action.
pageview
clickstream
A sequential series of page view requests
user session
The click-stream of pageviews for a single user across the entire web
server session
The set of pageviews in a user session for a particular web site
episode
Any semantically meaningful subset of a user or server session.
Typical Web Usage Mining Preprocessing
A
Example
B
USER1 : A B F O G A D
USRE2 : A B C J
USRE3 : L R
F
O
T
G
P
C
H
I
Q
D
J
K
E
L
R
M
N
S
Usage Mining
• After preprocessing, we will have
– A set of n pageview records, P = { p1, p2, … , pn }
– A set of m user transactions, T = { t1, t2, … , tm }
• Each transaction can be viewed as n-dimensional vector
t = <w(p1,t), w(p2,t), … , w(pn,t)>
• Goal of Usage Mining
– Aggregate Usage profiles representing groups of different user
behaviors.
– Each item in a usage profile is a URL representing a relevant
pageview object, and can have an associated weight representing its
significance within the profile.
Transaction Clustering
• Use k-means algorithm to partition this this pageview space
into different clusters.
• PACT(Profile Aggregations on Clustering Transactions)
Given a transaction cluster c, construct a usage profile prc.
prc = { <p,weight(p,prc)> | p P, weight(p,prc)  }
weight(p,prc) =
1 Σ w(p,t)
|C| tc
Pageview Clustering (1/2)
• Use Apriori algorithm to find frequent item sets.
• Use (ARHP)Association Rule Hypergraph Partitioning to
find aggregate profiles.
Hypergraph H = (V,E)
V : pageview set
E : weighted frequent itemsets
F
D
average confidence
O
J
0.6
L
0.4
E
A
P
G
H
R
K
M
0.7
0.6
B
C
I
N
Q
Pageview Clustering (2/2)
Fitness(C) =
F
| {e| e C, v e}|
Connectivity(v) =
D
J
|{e|e C}|
O
0.6
L
0.4
E
A
Σe C Weight(e)
Σ|e∩ C| Weight(e)
P
G
R
K
M
0.7
H
0.6
B
2
Q
N
I
C
2
1
Recommendation
• Given a usage profile C, we can represent C as a vector
C = { w1c, w2C, … ,wnC }
Wic = weight(pi,C), if pi  C
0, otherwise
• Given current active session S, S=<s1,s2,…,sn>
match(S,C) =
Σwkcsk
 Σ(sk)2 Σ(wkc)2
Rec(S,p) =  weight(p,C)match(S,C)