Example of fuzzy web mining algorithm
Download
Report
Transcript Example of fuzzy web mining algorithm
Mining linguistic browsing patterns in
the world wide web
Authors: Tzung-Pei Hong, Kuei-Ying Lin, Shyue-Liang Wang
Soruces: Soft Computing –A Fusion of Foundations, Methodlogies
and Application, Vol. 6, No. 5, pp. 329 – 336, August 2002
Speaker: Hui-Lin Weng
Date : 12/13/2005
1
Outline
• Introduction
– Web-content mining
– Web-usage mining
•
•
•
•
•
The proposed algorithm
The fuzzy data mining approach
Example of fuzzy web mining algorithm
Conclusions
Comments
2
Introduction
• Web-content mining
– Focus on information discovery from sources across the
world wide web
– e.g. mining page-keyword relations from web pages
• Web-usage mining
– Focus on the automatic discovery of user access patterns
from web servers
– e.g. mining page browsing patterns from log files
3
The proposed algorithm
• A novel web-mining algorithm to find linguistic
browsing behaviors from data logs on web severs.
• Goals:
– Use fuzzy concept to analyze the browsing time of a
customer on each web page.
– The algorithm focuses on the most important linguistic terms
for reduced time complexity.
4
The fuzzy data mining approach
• The approach is consisted of three main steps:
– Step1:
• Transform each quantitative value in the transaction data into a
fuzzy set using the given membership function.
– Step2:
• Generate large itemsets by calculating the fuzzy cardinality of
each candidate itemset.
– Step3:
• Induce fuzzy association rules from the large itemsets found in
step 2.
5
Example of fuzzy web mining algorithm (1/9)
• Input
– Log data
• Include date、time、client-ip、file name
– Membership function
• For converting browsing durations into linguistic terms
– Min-sup
• Output
– Fuzzy browsing patterns
6
Example of fuzzy web mining algorithm (2/9)
• Step 1:
– The following file names are selected
• .asp, .htm, .html, .jva, .cgi and closing connection
– The following four fields are kept
• date, time, cilent-ip and file-name
7
Example of fuzzy web mining algorithm (3/9)
• Step2:
– The values of field client-ip are transformed into contiguous
integers for convenience
• Step3:
– The log data sorted first by encoded client ID and then by
date and time
8
Example of fuzzy web mining algorithm (4/9)
• Step 4:
– The time durations of the web pages browsed by each
encoded client ID are calculated
• e.g. 2001/03/01, 05:39:56 – 2001/03/01, 05:40:26,the time
duration is 30 seconds.
• Step 5:
– The web pages browsed by each client are listed to form
browsing sequence
9
Example of fuzzy web mining algorithm (5/9)
• Step 6:
– The time durations are represented as fuzzy sets
• Using the given membership functions
• e.g. the second item (B, 30) in Client 1
(0.8 / B.Short + 0.2 / B.Middle)
10
Example of fuzzy web mining algorithm (6/9)
• Step 7:
– The maximum membership value for each region in each
sequence is found
• e.g. client 2: (0.2/D.Short + 0.8/D.Middle) (0.8/B.Short +
0.2/B.Middle) (0.6/D.Middle + 0.4/D.High)
• D.Middle:max(0.8, 0.0, 0.6)=0.8
• Step 8:
– The support value of each region is calculated
• e.g. D.Middle:client 1: max(0,0,0.6,0)+client 2:
max(0.8,0,0.6)+client 3: max(0,0.8)+client 4:
max(0,0,0,0,0)+clinet 5: max(1.0,0,0)+client 5:
max(1.0,0,0,0)=0.6+0.8+0.8+0.0+1.0+1.0=4.2
11
Example of fuzzy web mining algorithm (7/9)
• Step 9~11:
– Large 1-sequences are generated
• e.g. Assume Min-sup: 2
• B.Short, C.Middle, D.Middle
• Step 12~15:
– Large k-sequences are generated (candidate 2-itemsets)
•
•
•
•
B.Short, C.Middle
C.Middle, B.Short
B.Short, D.Middle
D.Middle, B.Short
12
Support value of composite regions
• The support value of each composite region is
calculated
– For example:client 4 (B.Short, C.Middle)
– (1.0/B.Short) (0.6/C.Middle + 0.4/C.High) (0.2/E.Middle +
0.8/E.High) (1.0/B.Short) (0.6/C.Short + 0.4/C.Middle)
– max[min(1.0, 0.6), min(1.0, 0.4)] = 0.6
Client ID
Membership value of (B.Short, C.Middle)
1
0.8
2
0.0
3
0.0
4
0.6
5
0.8
6
0.0
The support value:0.8+0.0+0.0+0.6+0.8+0.0=2.2
13
Example of fuzzy web mining algorithm (8/9)
• Large 2-sequences:
– (B.Short, C.Middle)
– (D.Middle, B.Short)
– (D.Middle, C.Middle)
• In this example, no large 3-sequences exist.
14
Conclusions
• The duration of each web page browsed by a client is
calculated from the time durations are numeric
• The web-mining of authors uses fuzzy concepts to
form linguistic terms that can reduces its time
complexity.
15
Comments
• How to deal with the problem when the user idle?
• How to deal with the problem of the user only stay
several seconds?
16
Thanks for your listening!
17