emory_internal_behav.. - Emory University
Download
Report
Transcript emory_internal_behav.. - Emory University
EUBA: The Emory User Behavior
Analysis System
Eugene Agichtein, Qi Guo and Ryan Kelly
Intelligent Information Access Lab http://ir.mathcs.emory.edu
Math & Computer Science Department
Arthur Murphy, Selden Deemer, Kyle Fenton
Emory Libraries
Goals/Motivation
Evaluate effectiveness of search and discovery with
automatic behavioral metrics
Perform aggregate and longitudinal studies
Develop tools for usability studies “in the wild”
Scale (hundreds/thousands of “participants”)
Realistic behavior and tasks
On-demand playback of “interesting” sessions
Unified analysis/query framework for internal and
external resource access and usage statistics
Web-based query and statistics interface
Access auditing, privacy, anonymity enforced
Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
2
Approach: Client-side instrumentation
Implemented on top of the Emory
Installation of the LibX Toolbar:
(http://www.libx.org)
Extended LibX to track UI events:
JavaScript patch to sample the mouse
movements and other events on pre-specified
web search pages. Events are encoded into a
string and buffered, and periodically sent to the
server (on internal library network).
Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
3
Events captured (v0.5, Aug. 2008)
Button/link clicks/Url changes
Name of the button, link, other meta-info
Mouse movements
(x,y) coordinates sampled ~every 10ms
Scrolling
Start, stop position, ~ every 10ms
Text entry, keypress (ctrl-c, ctrl-v)
Query text, options changes
Menu item events
Print, bookmark, save (all of them)
Hover over important elements
Mouse-in/out of browser
Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
4
How it works
On login to Learning Commons, Firefox is
started with
http://irlib.library.emory.edu/consent.cgi?user=USERID
If previously opted in (or out), goto homepage
Else show consent form
Store user choice in database; if opted in, also
store salted hash string for user login
Can opted-in user behavior over “lifetime”
No way to recover login id by dictionary attack
Can be removed at any time by deleting mapping
Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
5
How it works (2 of 3): Consent
Request for Logging of Internet Use
To improve our web services, Emory Libraries are evaluating the use of our discovery tools (EUCLID,
Databases, eJournals, Research Guides, Reserves Direct, Google Scholar, etc.).
We would like to capture the web traffic of your browser session to enable us to log and evaluate our
patrons’ success in finding scholarly resources within the Learning Commons.
All data logged will be anonymous so that specific internet use will not be connected to a specific
individual. (Details of Research Protocol)
Despite the data capture safeguards, you may wish to “opt out” of this log file recording process.
Please select a choice:
Log my Internet use during this semester
Do not log my Internet use during this semester
Continue Logon
This study is being undertaken by the University Libraries under the auspices of Emory University’s
Institutional Review Board. To contact the Principal investigators of this study, please send email to:
[email protected] or [email protected].
http://irlib.library.emory.edu/
Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
6
How it works (3 or 3): which URLs?
For all visited URLs LibX notifies the server;
information varies by type of site:
White list (search sites):
Black list (known private sites): Only domain name is
saved
All “https://” and “mail.*” URLs
White list (known search/discovery sites):
EUCLID, Primo, Google, Google Scholar, Yahoo and
Live search engines, Wikipedia
All events captured
Gray list (search results and important public sites)
Mouse moves and clicks (no keypress/text)
The rest:
Only URL, button clicks, and menu items
Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
7
Emory User Behavior Analysis System
Combines client side instrumentation,
server-side caching, log
management, querying, and analysis
Client-side instrumentation, data
mining/machine learning (Qi Guo)
Log DB parsing, indexing, web-based
interface for querying, playback,
annotation (Ryan Kelly)
Plan: to release the system to
research/library community (2009?)
Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
8
EUBA Web-based analysis interface
Prototype:
http://ir.mathcs.emory.edu/library/private/index.pl
user: test
password: notsafe
Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
9
Future Plans
Incorporate log data for ranking, discovery,
query suggestion, collaborative filtering
Richer statistics and visualization
Streamline usability studies
Comments and suggestions welcome!
Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
10