Lies, damn lies and Web statistics

Download Report

Transcript Lies, damn lies and Web statistics

Lies, damn lies and Web statistics
A brief introduction to using
and abusing web statistics
Paul Smith, ILRT
July 2006
1
Overview
• Some key terms explained
• Log Analysers
– What you can find out
– What you can’t find out
– Cache for questions
• Trackers / counters
• Further reading
2
Key terms
•
•
•
•
•
•
•
•
Log file
IP address
Hit
Visitor / visit / user session
Page request / view
Referrer
Cache server
Proxy server
3
Log Analysers
• A few examples:
– from Google Directory listing
• What we use in ILRT:
–
–
–
–
UNIX scripts
Analog
AWStats
WebTrends
/ tools
[www.analog.cx]
[awstats.sourceforge.net]
[www.webtrends.com]
4
What you can find out
• Number of requests made to your
server
• When they were made
• Which files were asked for
• Which host asked you for them.
5
What you can find out (cont’d)
• What people told you their browsers were
• What the referring pages were
You should be aware, though, that:
• Many browsers deliberately lie
• Users can configure the browser name
• Some people use "anonymisers" which
deliberately send false browsers and referrers.
6
What you can’t do
• You can't tell the identity of your users
• You can't tell how many visitors you've
had
• You can't tell how many visits you've
had
• Cookies don't solve these problems
• You can't follow a person's path through
your site
7
What you can’t do (cont’d)
• You often can't tell where users entered
your site, or where they linked to you
from
• You can't tell how they left your site, or
where they went next
• You can't tell how long people spent
reading each page
• You can't tell how long people spent on
your site
8
Cache for questions
• Cacheing proxy servers are the
main problem:
– if users get your pages from a local cache
server, you will never know
– many users can connect to your server
using the same cache/proxy server
– one user can appear to connect from many
different hosts (eg AOL)
9
Trackers / Counters
• A more recent innovation:
– Code embedded in each of your web pages
– Makes call directly to host data server
– Can reveal more detail (screen size, screen
colours, originating host name, etc)
• Examples:
– SiteStat:
[www.nedstat.co.uk/]
– Google Analytics: [analytics.google.com]
10
Further reading
• How the Web works by Stephen Turner
• Interpreting WWW Statistics by Doug Linder
• Measuring Web Site Usage: Log File Analysis
by Susan Haigh and Janette Megarity
• Who Goes There?
• Measuring Library Web Site Usage by Kathleen
Bauer
• Why Web Usage Statistics are (Worse Than)
Meaningless by Jeff Goldberg.
11