Initial presentation

Download Report

Transcript Initial presentation

Automatic Classification of
Bookmarked Web Pages
Individual APT Presentation
January 2007
1
Introduction
• There are no pre-requisites
• May be useful for students who intend to follow
CSA3200 (Adaptive Hypertext Systems) in 4th
year
2
Aims
• Helping to keep bookmark files organised
• When a user chooses to bookmark a web page,
system recommends one of the user’s existing
categories (instead of just last location saved to,
or bookmark root)
3
How?
• 2 algorithms to perform bookmark classification
– One builds a representative document of each
category (will be provided)
– Second approach is up to you
• An additional utility may be proposed to
improve results
– E.g., synonym recognition
4
Why?
• Having organised bookmark files will enable us
to do…
– Automatic query generation from bookmark files
– Web page recommendation based on other people’s
bookmark files
–…
5
How?
• Start with Open Source framework provided by
Ian Bugeja in his HyperBK project
• Build algorithms
• Build evaluation platform for your system
– I will provide 8 bookmark files for you to use
• You can remove some URLs at random to see if your
algorithms classify them correctly
• You will also attempt to reconstruct each bookmark file
from scratch!
6
Evaluation
• I will provide another 20 bookmark files (with
some URLs randomly removed) for you to use
to evaluate your algorithms
• Students who have the best performing
algorithms and best reports will have
opportunity to continue working on system for
FYP and to submit co-authored paper to leading
IR/Adaptive systems conference
7
Tools
• I recommend…
– Mozilla Firefox
– Xul (XML User Interface Language) and JavaScript
• A tutorial on Xul will be provided
– Google API
– You’ll be able to use Ian Bugeja’s framework and your plugin will be portable!
• But you’re free to use any other browser, platform,
language
8