emory_internal_behav.. - Emory University

Download Report

Transcript emory_internal_behav.. - Emory University

EUBA: The Emory User Behavior
Analysis System
Eugene Agichtein, Qi Guo and Ryan Kelly
Intelligent Information Access Lab http://ir.mathcs.emory.edu
Math & Computer Science Department
Arthur Murphy, Selden Deemer, Kyle Fenton
Emory Libraries
Goals/Motivation
 Evaluate effectiveness of search and discovery with
automatic behavioral metrics
 Perform aggregate and longitudinal studies
 Develop tools for usability studies “in the wild”
 Scale (hundreds/thousands of “participants”)
 Realistic behavior and tasks
 On-demand playback of “interesting” sessions
 Unified analysis/query framework for internal and
external resource access and usage statistics
 Web-based query and statistics interface
 Access auditing, privacy, anonymity enforced
Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
2
Approach: Client-side instrumentation
 Implemented on top of the Emory
Installation of the LibX Toolbar:
(http://www.libx.org)
 Extended LibX to track UI events:
JavaScript patch to sample the mouse
movements and other events on pre-specified
web search pages. Events are encoded into a
string and buffered, and periodically sent to the
server (on internal library network).
Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
3
Events captured (v0.5, Aug. 2008)
 Button/link clicks/Url changes
 Name of the button, link, other meta-info
 Mouse movements
 (x,y) coordinates sampled ~every 10ms
 Scrolling
 Start, stop position, ~ every 10ms
 Text entry, keypress (ctrl-c, ctrl-v)
 Query text, options changes
 Menu item events
 Print, bookmark, save (all of them)
 Hover over important elements
 Mouse-in/out of browser
Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
4
How it works
 On login to Learning Commons, Firefox is
started with

http://irlib.library.emory.edu/consent.cgi?user=USERID
 If previously opted in (or out), goto homepage
 Else show consent form
 Store user choice in database; if opted in, also
store salted hash string for user login
 Can opted-in user behavior over “lifetime”
 No way to recover login id by dictionary attack
 Can be removed at any time by deleting mapping
Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
5
How it works (2 of 3): Consent
Request for Logging of Internet Use
To improve our web services, Emory Libraries are evaluating the use of our discovery tools (EUCLID,
Databases, eJournals, Research Guides, Reserves Direct, Google Scholar, etc.).
We would like to capture the web traffic of your browser session to enable us to log and evaluate our
patrons’ success in finding scholarly resources within the Learning Commons.
All data logged will be anonymous so that specific internet use will not be connected to a specific
individual. (Details of Research Protocol)
Despite the data capture safeguards, you may wish to “opt out” of this log file recording process.
Please select a choice:
Log my Internet use during this semester
Do not log my Internet use during this semester
Continue Logon
This study is being undertaken by the University Libraries under the auspices of Emory University’s
Institutional Review Board. To contact the Principal investigators of this study, please send email to:
[email protected] or [email protected].
http://irlib.library.emory.edu/
Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
6
How it works (3 or 3): which URLs?
 For all visited URLs LibX notifies the server;
information varies by type of site:
 White list (search sites):
 Black list (known private sites): Only domain name is
saved
 All “https://” and “mail.*” URLs
 White list (known search/discovery sites):
 EUCLID, Primo, Google, Google Scholar, Yahoo and
Live search engines, Wikipedia
 All events captured
 Gray list (search results and important public sites)
 Mouse moves and clicks (no keypress/text)
 The rest:
 Only URL, button clicks, and menu items
Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
7
Emory User Behavior Analysis System
 Combines client side instrumentation,
server-side caching, log
management, querying, and analysis
 Client-side instrumentation, data
mining/machine learning (Qi Guo)
 Log DB parsing, indexing, web-based
interface for querying, playback,
annotation (Ryan Kelly)
 Plan: to release the system to
research/library community (2009?)
Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
8
EUBA Web-based analysis interface
Prototype:
http://ir.mathcs.emory.edu/library/private/index.pl
user: test
password: notsafe
Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
9
Future Plans
 Incorporate log data for ranking, discovery,
query suggestion, collaborative filtering
 Richer statistics and visualization
 Streamline usability studies
 Comments and suggestions welcome!
Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
10