Analyzing Web Logs

Download Report

Transcript Analyzing Web Logs

Analyzing Web Logs
Sarah Waterson
18 April 2002
SIMS 213
User
Interface
Group
for
Research
Talk Outline




What is a web log?
Where do they come from?
Why are they relevant?
How can we analyze them?
 Study
 Discussion
SIMS 213
18 April 2002
What is a web log?
A
A record
record of
of aa visit
visit to
to aa web
web page






Visitor (IP address)
URL
Time of visit
Time spent on a page
Browser used
Referring URL
SIMS 213
18 April 2002
 Type of request
 Reply code
 Number of bytes in
the reply
 etc…
What is a clickstream?
A
A record
record of
of aa path
path through
through web
web pages
pages






Visitor (IP address)
URL
Time of visit
Time spent on a page
Browser used
Referring URL
SIMS 213
18 April 2002
 Type of request
 Reply code
 Number of bytes in
the reply
 Next URL
 etc…
What is a Web Log?
Apache web log:
205.188.209.10 - - [29/Mar/2002:03:58:06 -0800] "GET
/~sophal/whole5.gif HTTP/1.0" 200 9609
"http://www.csua.berkeley.edu/~sophal/whole.html"
"Mozilla/4.0 (compatible; MSIE 5.0; AOL 6.0; Windows 98;
DigExt)"
216.35.116.26 - - [29/Mar/2002:03:59:40 -0800] "GET
/~alexlam/resume.html HTTP/1.0" 200 2674 "-" "Mozilla/5.0
(Slurp/cat; [email protected];
http://www.inktomi.com/slurp.html)“
202.155.20.142 - - [29/Mar/2002:03:00:14 -0800] "GET
/~tahir/indextop.html HTTP/1.1" 200 3510
"http://www.csua.berkeley.edu/~tahir/" "Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1)“
202.155.20.142 - - [29/Mar/2002:03:00:14 -0800] "GET
/~tahir/animate.js HTTP/1.1" 200 14261
"http://www.csua.berkeley.edu/~tahir/indextop.html"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)“
SIMS 213
18 April 2002
Where do they come from?
Servers
 Done on most
web servers
 Standard formats
Clients
Proxy Log
 Browsers, loggers on client machine
 Must send data back
Proxies
 Similar to servers
 Hang out in between client and server
SIMS 213
18 April 2002
Why are web logs relevant?
 Lots of data
 Quantitative analysis is much more fun!
 User behavior, patterns
 Real users, tasks
 Or at least more realistic users and tasks
 Leaving the usability lab
 Testing effect
 Fast, easy, cheap
 Automatic or almost-automatic
SIMS 213
18 April 2002
Ed Chi asks…
Usage:
 How has information been accessed?
 How frequently?
 What’s popular? What’s not?
 How do people enter the site? Exit?
 Where do people spend time?
 How long do they spend there?
 How do people travel within the site?

Who
are
the
people
visiting?
SIMS 213
18 April 2002
Ed Chi asks…
Structural:
 What information has been added, deleted,
modified, moved?
 Usage + Structural




What happens when the site changes?
Does navigation change?
Does popularity change?
What about missing data?
SIMS 213
18 April 2002
(Google)
How do you analyze web logs?
1.
Data Mining: task or intent unknown
 “Automated extraction of hidden predictive
information from (large) databases” – Kurt Thearling
 Server log analysis
What are people doing?
2.
Remote Usability Testing: task or intent known
 Similar to traditional lab usability testing
 Clickstream analysis
How well does the site support
SIMS 213
18 April 2002 what people are doing?
How? Data Mining
Statistics and numbers galore!
 Gazillions of tools for server log analysis
Computers>Software>Internet>Site Management> Log Analysis
 Usually charts, graphs, numbers galore
 Analog & NetTracker typical statistics
 In 3D too (eBizinsights)
SIMS 213
18 April 2002
How? Data Mining cont’d
Other interesting work:
 Web Ecologies (Chi)
 Development over time
 Information scent (Chi)
 Behavior patterns
 Understand how to organize info
“Information scent is made of cues that people
use to decide whether a path is interesting.“
 SIMS
Useful
for web designers?
213
18 April 2002
Web Ecologies (Chi 1998)
SIMS 213
18 April 2002
How? Remote Usability Testing
 Analyze clickstream in the context of the task
and user intentions
 Can be gathered on client, server, and via
proxy
 Varied granularities of interaction
 Mouse movements  page access
 Varied levels of user awareness
 Interactive  invisible
 Varied levels of access
 Site only  entire web
SIMS 213
18 April 2002
How? Remote Usability Testing
WebVip and VisVip (NIST)
 Server side logging
 Javascript instrumentation
 Individual paths within
context of site
 Animation/replay sessions
Questions:
 What part of site used
for a task? Not used?
 How long to finish task?
Per page?
 What sorts of behavior
SIMS 213
for
task?
18
April
2002
How? Remote Usability Testing
ClickViz (Blue Martini)
 Server side logging
 Custom instrumentation
 Aggregate paths based on
file system
 Include demographics,
purchase history
 Filtering
Questions:
 How does visitor of type
X compare to type Y?
 Success vs. “failure”
SIMS 213
18 April 2002
How? Remote Usability Testing
Vividence ClickStreams
 Not restricted to servers
 Testing suites
 Interesting aggregation
methods
SIMS 213
18 April 2002
NetRaker Clickstream
How? Remote Usability Testing
WebQuilt (GUIR)
Logging Design Goals:




Extensible, Scalable
Allow for unobtrusive, “naturalistic” user interaction
Multi-platform, multi-device compatibility
Fast and easy to deploy on any website
Solution:
 Proxy-based logger rewrites links
 Nearly invisible to user
 Independent of client browser
 Infer actions (e.g. back button clicks)
SIMS 213
18April
2002 alone or use with other tools
Stand
How? Remote Usability Testing
WebQuilt (GUIR)
Visual Analysis Tool:
 Put data within context of the design
 Show deviations from expected paths
 Interactive graph
SIMS 213
18 April 2002
Study: Purpose
 Exploratory comparison of lab and remote
usability testing with mobile devices
 What types of usability issues can we:
 find with either method?
 find with one that we can’t find with the other?
 Design implications
 testing tools
 testing strategies
SIMS 213
18 April 2002
Study: The Mobile Web
 Limited and/or new interaction methods
 Small screens
 Graffiti, keypads, thumb-pads
 Beyond the desktop
 Driving, traveling, walking
 Noisy, public
Gathering good usability data is vital to
making these interfaces, and subsequently
these devices, successful.
SIMS 213
18 April 2002
Study: Design
 10 users asked to find:
 Anti-lock brake information
on the latest Nissan Sentra
 The closest Nissan dealer
 http://pda.edmunds.com
 Handspring Visor Edge with
OmniSky wireless modem
 5 users in the lab
 5 users in the wild
 Web-based questionnaires
SIMS 213
18 April 2002
Study: Identifying Usability Issues
Lab Data
 Tester observations
 Participant comments
 Questionnaire
Remote Data
 Clickstream analysis
 Questionnaire
Four Categories
 Device
 Browser
 Site Design
 Test Design
Severity Levels
 0 indicates a comment
 15 (minorcritical)
SIMS 213
18 April 2002
Study: Caveats
 Analysis and observation for both tests done
by same person
 Issues identified from remote tests first
 Avoids biasing remote analysis tools
 Looking for potential problem areas
SIMS 213
18 April 2002
Study: Results
Totals:
 18 unique issues
 7 found remotely
 1/3 device or
browser related
Site Design
 5 of the 9 issues
 3 of the 4 with
severity level > 3
SIMS 213
18 April 2002
Lab
Remote
Device
4
1
Browser
2
0
Test Design
6
2
Site Design
9
5
Test Design
 2 of the 6 issues
 2 of the 4 with severity
level > 3
Study: Process Observations
Remote usability testing can capture
some usability issues that lab testing
already discovers
Lab testing gets me:
 Qualitative observations
 Thinking aloud comments
 Non-content usability issues
SIMS 213
18 April 2002
Study: Process Observations
What can remote testing get
us that labs can’t?
 Lab effect
 Quitting a task is easier when not in lab
 Network problems more realistic
 With more users
 Patterns emerge
 Can reduce uncertainty
 Faster
SIMS 213
18 April 2002
Study: Conclusions
Remote usability testing is a promising
technique for capturing realistic usage
data for mobile web site design
Main concerns
 Gathering user feedback on mobile devices is even more
difficult because of limited input
 Understanding users can be ambiguous
 Potentially alleviated by ability to test larger number of users
SIMS 213
18 April 2002
Discussion
Design
 Comments
 Questions
Evaluate
Prototype
 Where does web log analysis fit into a
design cycle?
 Understanding what methods to use when
and where
 Experiences?
 These or other tools?
SIMS 213
18 April 2002