Transcript Document
Tallinn University of Technology
Department of Computer Engineering
Applying User Profile Ontology for
Mining Web Site Adaptation
Recommendations
Tarmo Robal, Ahto Kalja
[email protected], [email protected]
Outline
Introduction
» Web Mining & Adaptive Web Sites
» Recommender Systems
Web Usage data Capturing
User Profiles Extraction
Recommendations Generation
Summary
2
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Introduction
The electronic age
» Internet – enourmous source of information
» Competition over users
» Browsing affected by many factors
System feedback
» What is actually going on within the system
» Observe users’ actions & preferences
Constant need for web improvement!
3
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Reaching the Aim
Make browsing easier - better user experience
Collect usage data
» Exploit a log system
Apply web mining techniques on the collected data to:
» Analyse & Reason
Employ the mining results
» Construct users’ profile ontology
» Adaptive websites & Recommender systems
Continue collecting usage data
4
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Introduction
Research based on the access data of the website of
our department
»
»
»
»
»
Dynamic website
Run by system kernel developed at our lab
Witholds 118 pages
Average access rate 250 sessions daily
Average number of operations
per session 1.9
(4.3 in sessions with more
than 1 page request)
» http://ati.ttu.ee
5
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Web Mining
... is the use of data mining techniques to automatically
discover and extract information from Web documents
and services (Perkowitz and Etzioni 2001)
Content Mining
discovery of document content patterns
Structure Mining
discovery of hypertext/linking structure patterns
Usage Mining
discovery of access patterns
Profile Mining
discovery of user profiles
6
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Adaptive Websites
... sites that automatically improve their organization and
presentation by learning from visitor access patterns
Tactical
» Adaptions triggered in real time
» Adding value to provided information
» Highlightning items
» Recommending items
» Easier browsing
Strategic
» Adaptions triggered on the structure
» Offline & with approval
Towards enhanced web experience!
7
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Recommender Systems
To assist users during browsing
Improved user experience
More relevant information for the user
Based on site’s usage:
» Transparent i.e. general
» Personalized (i-banking)
Why?
» Constant competition over rating
» Marketing, e-commerce, information portals, ...
8
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Recommender Systems
Users implicitly use a concept model based on their
own knowledge of the domain or topic searched, even
though mostly they do not know how to represent it!
(Li & Zhong)
If we are able to track down users’ actions, we are also
able to produce dynamically discovered
recommendations
Step towards intelligent web
Basis for adaptive web
9
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Collecting Web Usage Data
Explicit data collection
Implicit data collection
»
»
»
»
Transparent to end-user
Monitor accessed pages
Time spent on a particular page
Discover navigational paths
Need for a special log system
» Ability to capture distinct and recurrent user
sessions
10
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Web Server Logs NOT Suitable?
Reasons:
»
»
»
»
»
»
»
11
suffer from insufficiencies
do not allow to identify visitor sessions
impossible to track recurrent visits
no information about users’ screen resolution
are not kept for a long period of time
are of large size
a lot of detailed information about every element
accessed on the web server
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
The Log System
Data collected:
»
»
12
Page requested
Client identifier (session ID)
Request time
IP and host
Browser and OS
Tarmo Robal, Ahto Kalja
»
»
»
»
»
Query method and query string
Site referrer
Page load time and server load
Recurrent visit ID (session ID)
Screen resolution
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
User Profiles Extraction
Construct user navigational paths from
session data s=<pi, pi+1,…pn>
pi P
» 269 782 paths
Apply further processing
» 87 953 paths
Apply the Locality Model onto discovered
paths
» Extract localities L
L = pj, pj+1, … pm,
where pj pj+1 … pm
» Size of locality window w?
13
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
The Locality Model
If a large number of users frequently access a set of
pages, then these pages must be related
The locality L is defined by the users nearest sequential
activity history within the site during a session
L is constructed based on navigational paths
Users are moving from one locality L to another, which
can be represented by the w latest operations
(requests for pages)
L
100 – 400 –
410 – 400 – 410 - 4110 – 410 – 460 – 430
w
w
w
14
...
Tarmo Robal, Ahto Kalja
w=3
L=CalculateLocality(st,w)
N=FindNextItem(st,w)
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
The Locality Model
What’s the size of w?
w has to cover a rationale
amount of page requests
Attributes observed:
» cover percentage for the number of combinations computed
from the paths
» average frequency of finding these combinations in paths
» average number of possible localities in path
» the availability of next item for each locality (progress)
The size of w is correllated to the absolute menu depth
Properties observed
(1) Combination coverage [%]
(2) Combination frequency
(3) No of localities in path
(4) Availability of next item [%]
15
Tarmo Robal, Ahto Kalja
Studied window size w
2
3
4
5
31.2 35.5 20.7 12.6
1.1
1.0
1.0
1.0
6.3
6.6
6.5
5.9
76.6 77.4 74.1 76.3
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
User Profiles Extraction
User Session from DW
Navigational path sequence construction
100 – 400 – 410 – 410 – 400 – 410 -4110 – 410 – 460 – 430 – 430
Path minimization –
100 – 400 –
removal of redundant operations
410 – 400 – 410 - 4110 – 410 – 460 – 430
Filtering of non-relevant paths
100 – 400 –
(e.g. paths with 1 item)
410 – 400 – 410 - 4110 – 410 – 460 – 430
User Session
100
400
410
410
4110
410
460
430
430
Extracting localities L with size w
100 – 400 – 410
400 – 410 – 400
410 – 400 – 410
400 – 410 - 4110
100 – 400 – 410
Removal
of cyclic
localities
400 – 410 - 4110
410 - 4110 – 410
4110 – 410 – 460
4110 – 410 – 460
410 – 460 – 430
410 – 460 – 430
Extracted user
profiles
Ontology
16
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
User Profiles Ontology
Frequent user
profiles
discovered
from web
usage
Predefined
user profile
classes
Mapping of
Extracted Profiles
onto Concepts of
Web Ontology
Concepts of
Web Ontology
for ati.ttu.ee
17
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Inferred Ontology
Definitions for
predefined user
profile classes
Profiles inferred for
predefined user
profile class
18
Users profiled as Students
are interested in ...
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Producing Recommendations
RE determines the type of user online
RE computes recommendations
» User’s recent actions
» Knowledge from ontologies
» Page ranking
Pages ranked with inverse time weighting
algorithm
n
Interest value(i)
Rank p
Age (i)
i 1
19
Tarmo Robal, Ahto Kalja
No. of hits during Age(i)
Days into the past
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Producing Recommendations
LOG SYSTEM
Usage Data
Capturing
Refined Topology
Data Mining
Recommended
Sub-Topology
Tactical Adaption
Detection of Locality
Window Size w
Strategic Adaption
Extracted
User Profiles
Web Ontology
Web Site
Ontology
MAPPING
Ranked
Pages
Recommendation
Engine (RE)
Profiles Ontology
20
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
The Locality of User Online (recent w actions)
Web Site
Tactical Recommendations
Raising / highlightning items during
user’s online session
Adding recommended items to existing
topology
Providing sub-topologies for targeted
user groups
Enhanced (semi-personalized)
User Experience
21
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Strategic Recommendations
Deriving recommendations for general
site improvement to adjust sites to their
users preferences
Long-term
Discovering related page-sets according
to users preferences
Improved Site Structure
22
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Conclusions
Monitoring users actions and producing
concept models based on that enables to
» Classify a user as an individual into one of
the conceptual user groups (predefined user
profiles)
» Produce recommendations that correlate to
that particular individual
» Tactical recommendations
» Strategic recommendations
23
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Summary
Introduction
» Web Mining & Adaptive Web Sites
» Recommender Systems
Web Usage data Capturing
User Profiles Extraction
Recommendations Generation
Summary
24
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007
Tallinn University of Technology
Department of Computer Engineering
Thank you!
Questions?
26
Tarmo Robal, Ahto Kalja
ADBIS'07, Bulgaria, Varna 29.09-03.10.2007