Analyzing Web Logs
Download
Report
Transcript Analyzing Web Logs
Analyzing Web Logs
Sarah Waterson
18 April 2002
SIMS 213
User
Interface
Group
for
Research
Talk Outline
What is a web log?
Where do they come from?
Why are they relevant?
How can we analyze them?
Study
Discussion
SIMS 213
18 April 2002
What is a web log?
A
A record
record of
of aa visit
visit to
to aa web
web page
Visitor (IP address)
URL
Time of visit
Time spent on a page
Browser used
Referring URL
SIMS 213
18 April 2002
Type of request
Reply code
Number of bytes in
the reply
etc…
What is a clickstream?
A
A record
record of
of aa path
path through
through web
web pages
pages
Visitor (IP address)
URL
Time of visit
Time spent on a page
Browser used
Referring URL
SIMS 213
18 April 2002
Type of request
Reply code
Number of bytes in
the reply
Next URL
etc…
What is a Web Log?
Apache web log:
205.188.209.10 - - [29/Mar/2002:03:58:06 -0800] "GET
/~sophal/whole5.gif HTTP/1.0" 200 9609
"http://www.csua.berkeley.edu/~sophal/whole.html"
"Mozilla/4.0 (compatible; MSIE 5.0; AOL 6.0; Windows 98;
DigExt)"
216.35.116.26 - - [29/Mar/2002:03:59:40 -0800] "GET
/~alexlam/resume.html HTTP/1.0" 200 2674 "-" "Mozilla/5.0
(Slurp/cat; [email protected];
http://www.inktomi.com/slurp.html)“
202.155.20.142 - - [29/Mar/2002:03:00:14 -0800] "GET
/~tahir/indextop.html HTTP/1.1" 200 3510
"http://www.csua.berkeley.edu/~tahir/" "Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1)“
202.155.20.142 - - [29/Mar/2002:03:00:14 -0800] "GET
/~tahir/animate.js HTTP/1.1" 200 14261
"http://www.csua.berkeley.edu/~tahir/indextop.html"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)“
SIMS 213
18 April 2002
Where do they come from?
Servers
Done on most
web servers
Standard formats
Clients
Proxy Log
Browsers, loggers on client machine
Must send data back
Proxies
Similar to servers
Hang out in between client and server
SIMS 213
18 April 2002
Why are web logs relevant?
Lots of data
Quantitative analysis is much more fun!
User behavior, patterns
Real users, tasks
Or at least more realistic users and tasks
Leaving the usability lab
Testing effect
Fast, easy, cheap
Automatic or almost-automatic
SIMS 213
18 April 2002
Ed Chi asks…
Usage:
How has information been accessed?
How frequently?
What’s popular? What’s not?
How do people enter the site? Exit?
Where do people spend time?
How long do they spend there?
How do people travel within the site?
Who
are
the
people
visiting?
SIMS 213
18 April 2002
Ed Chi asks…
Structural:
What information has been added, deleted,
modified, moved?
Usage + Structural
What happens when the site changes?
Does navigation change?
Does popularity change?
What about missing data?
SIMS 213
18 April 2002
(Google)
How do you analyze web logs?
1.
Data Mining: task or intent unknown
“Automated extraction of hidden predictive
information from (large) databases” – Kurt Thearling
Server log analysis
What are people doing?
2.
Remote Usability Testing: task or intent known
Similar to traditional lab usability testing
Clickstream analysis
How well does the site support
SIMS 213
18 April 2002 what people are doing?
How? Data Mining
Statistics and numbers galore!
Gazillions of tools for server log analysis
Computers>Software>Internet>Site Management> Log Analysis
Usually charts, graphs, numbers galore
Analog & NetTracker typical statistics
In 3D too (eBizinsights)
SIMS 213
18 April 2002
How? Data Mining cont’d
Other interesting work:
Web Ecologies (Chi)
Development over time
Information scent (Chi)
Behavior patterns
Understand how to organize info
“Information scent is made of cues that people
use to decide whether a path is interesting.“
SIMS
Useful
for web designers?
213
18 April 2002
Web Ecologies (Chi 1998)
SIMS 213
18 April 2002
How? Remote Usability Testing
Analyze clickstream in the context of the task
and user intentions
Can be gathered on client, server, and via
proxy
Varied granularities of interaction
Mouse movements page access
Varied levels of user awareness
Interactive invisible
Varied levels of access
Site only entire web
SIMS 213
18 April 2002
How? Remote Usability Testing
WebVip and VisVip (NIST)
Server side logging
Javascript instrumentation
Individual paths within
context of site
Animation/replay sessions
Questions:
What part of site used
for a task? Not used?
How long to finish task?
Per page?
What sorts of behavior
SIMS 213
for
task?
18
April
2002
How? Remote Usability Testing
ClickViz (Blue Martini)
Server side logging
Custom instrumentation
Aggregate paths based on
file system
Include demographics,
purchase history
Filtering
Questions:
How does visitor of type
X compare to type Y?
Success vs. “failure”
SIMS 213
18 April 2002
How? Remote Usability Testing
Vividence ClickStreams
Not restricted to servers
Testing suites
Interesting aggregation
methods
SIMS 213
18 April 2002
NetRaker Clickstream
How? Remote Usability Testing
WebQuilt (GUIR)
Logging Design Goals:
Extensible, Scalable
Allow for unobtrusive, “naturalistic” user interaction
Multi-platform, multi-device compatibility
Fast and easy to deploy on any website
Solution:
Proxy-based logger rewrites links
Nearly invisible to user
Independent of client browser
Infer actions (e.g. back button clicks)
SIMS 213
18April
2002 alone or use with other tools
Stand
How? Remote Usability Testing
WebQuilt (GUIR)
Visual Analysis Tool:
Put data within context of the design
Show deviations from expected paths
Interactive graph
SIMS 213
18 April 2002
Study: Purpose
Exploratory comparison of lab and remote
usability testing with mobile devices
What types of usability issues can we:
find with either method?
find with one that we can’t find with the other?
Design implications
testing tools
testing strategies
SIMS 213
18 April 2002
Study: The Mobile Web
Limited and/or new interaction methods
Small screens
Graffiti, keypads, thumb-pads
Beyond the desktop
Driving, traveling, walking
Noisy, public
Gathering good usability data is vital to
making these interfaces, and subsequently
these devices, successful.
SIMS 213
18 April 2002
Study: Design
10 users asked to find:
Anti-lock brake information
on the latest Nissan Sentra
The closest Nissan dealer
http://pda.edmunds.com
Handspring Visor Edge with
OmniSky wireless modem
5 users in the lab
5 users in the wild
Web-based questionnaires
SIMS 213
18 April 2002
Study: Identifying Usability Issues
Lab Data
Tester observations
Participant comments
Questionnaire
Remote Data
Clickstream analysis
Questionnaire
Four Categories
Device
Browser
Site Design
Test Design
Severity Levels
0 indicates a comment
15 (minorcritical)
SIMS 213
18 April 2002
Study: Caveats
Analysis and observation for both tests done
by same person
Issues identified from remote tests first
Avoids biasing remote analysis tools
Looking for potential problem areas
SIMS 213
18 April 2002
Study: Results
Totals:
18 unique issues
7 found remotely
1/3 device or
browser related
Site Design
5 of the 9 issues
3 of the 4 with
severity level > 3
SIMS 213
18 April 2002
Lab
Remote
Device
4
1
Browser
2
0
Test Design
6
2
Site Design
9
5
Test Design
2 of the 6 issues
2 of the 4 with severity
level > 3
Study: Process Observations
Remote usability testing can capture
some usability issues that lab testing
already discovers
Lab testing gets me:
Qualitative observations
Thinking aloud comments
Non-content usability issues
SIMS 213
18 April 2002
Study: Process Observations
What can remote testing get
us that labs can’t?
Lab effect
Quitting a task is easier when not in lab
Network problems more realistic
With more users
Patterns emerge
Can reduce uncertainty
Faster
SIMS 213
18 April 2002
Study: Conclusions
Remote usability testing is a promising
technique for capturing realistic usage
data for mobile web site design
Main concerns
Gathering user feedback on mobile devices is even more
difficult because of limited input
Understanding users can be ambiguous
Potentially alleviated by ability to test larger number of users
SIMS 213
18 April 2002
Discussion
Design
Comments
Questions
Evaluate
Prototype
Where does web log analysis fit into a
design cycle?
Understanding what methods to use when
and where
Experiences?
These or other tools?
SIMS 213
18 April 2002