Web Analytics by Dr. Robert J. Boncella Professor of

Download Report

Transcript Web Analytics by Dr. Robert J. Boncella Professor of

Web Analytics: A Brief Tutorial
by
Dr. Robert J. Boncella
Professor of Information Systems & Technology
School of Business
Washburn University
Presented
March 2008
To
SAIS 2008
1
Introduction
• Web analytics is the study of the behavior of
website visitors.
• In a commercial context, web analytics refers
to the use of data collected from a web site to
determine which aspects of the website achieve
the business objectives
• Tutorial Outline
– Web Analytics: Context
– Web Analytics: Technology & Terminology
– Web Analytics: Tools and Case Studies
2
Context for Web Analytics
• DSS – Decision Support System
– A conceptual framework for a process of supporting managerial
decision- making, usually by modeling problems and employing
quantitative models for solution analysis
• BI - Business Intelligence subset of DSS
– An umbrella term that combines architectures, tools, databases,
applications, and methodologies
• BA - Business Analytics subset of BI
– The application of models directly to business data
– Assists in making strategic decisions
• WA - Web Analytics subset of BA
– The application of business analytics activities to Web-based
processes, including e-commerce
3
Web Analytics - Details
• Relevant Technology
–
–
–
–
–
Internet & TCP/IP
Client / Server Computing
HTTP (HyperText Transfer Protocol)
Server Log Files & Cookies
Web Bugs
• Data Collection
– The Clickstream
• Server Log Files
• Page Tagging
• Data Analysis
– Data Preparation
– Pattern Discovery
– Pattern Analysis
4
Client/Server Computing
Server
Client
This is a request
This is a response
5
Internet & TCP/IP
• The Internet
– The infrastructure that provides for the
delivery of data between computer based
processes
• TCP/IP
– The protocols that provides for reliable
delivery of data on The Internet
6
HTTP Protocol
• Client sends a request to a server
• Server sends a response to client
• Connectionless
– Client:
• Opens connection to server
• Sends request
– Server
• Responds to request
• Closes connection
• Stateless
– Client/Server have no memory of prior
connections
– Server cannot distinguish one client request from
another client
7
Cookies
• Used to solve the “Statelessness” of the HTTP
Protocol
• Used to store and retrieve user-specific
information on the web
• When an HTTP server responds to a request it
may send additional information that is stored by
the client - “state information”
• When client makes a request to this server the
client will return the “cookie” that contains its state
information
• State information may be a client ID that can be
used as an index to a client data record on the
server
8
Web Bug Process
Server C
Server B
Page C cnts
- URLs & Img Src
- WebBug Img@
WBS. TRKSTRM.COM
Page B cnts
- URLs & Img Src
- WebBug Img@
WBS. TRKSTRM.COM
Req:
WBS
Cookie: My_Brwsr
Pg A - Server A
Pg B - Server B
Pg C - Server C
WebBug IMG
-Referer Header
- Any cookie for
TRKSTRM.com
Res:
WebBug Img
-Cookie to client
Browser on 1st Req.
Client
Browser
My_Brwsr
1. Render page
2. Click on URL
Req: Page_A.html
Server A
Res: Page_A.html
Page A cnts
- URLs & Img Src
- WebBug Img @
WBS. TRKSTRM.COM
9
Common Clickstream Data Sources
• Server Log Files
– Passive data collection
– Normal part of web browser/ web server
transaction
• Page Tagging
– Active data collection
– Often requires a third party to implement – a
vendor
10
Server Log Files
Each time a client requests a resource the server of
that resource may record the following in its log files:
•
•
•
•
•
The name & IP address of the client computer
The time of the request
The URL that was requested
The time it took to send the resource
If HTTP authentication used; the username of
the user of the client will be recorded
• Any errors that occurred
• The referer link
• The kind of web browser that was used
11
Server Log Files
• Example
– 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700]
"GET /apache_pb.gif HTTP/1.0" 200 2326
•
•
•
•
•
•
127.0.0.1 – Remote host
frank - user name
[10/Oct/2000:13:55:36 -0700] - date & time
"GET /apache_pb.gif HTTP/1.0" - request
200 - status
2326 - bytes
12
Server Log Files
• Technical issues for server log data
– Data Preparation
– Pageview Identification
– User Identification
– Session Identification
13
Page Tags as Data Source
• Provided by Third Party - Vendor
– Vendor Supplies Page Tags
– Vendor Collects the Data
– Vendor Analyzes the Data
– Business Accesses the Data
• Online or
• Reports sent to Business
14
Web Data Abstractions
• Abstractions concerning Web usage, Content,
and Structure
• Establishes precise semantics for the concepts
–
–
–
–
–
–
Web site
Users or Visitors
User Sessions
Server Sessions or Visits
Pageviews
Clickstreams
15
Data Abstractions
• Web Site - collection of interlinked Web pages,
including a host page, residing at the same
network location.
• User or Visitors - principal using a client to
interactively retrieve and render resources or
resource manifestations
– an individual that is accessing files from a
Web server, using a browser.
• User Session - a delimited set of user clicks
across one or more Web servers
16
Data Abstractions
• Server Session or Visit - a collection of user
clicks to a single Web server during a user
session
• Pageview - the visual rendering of a Web page
in a specific environment at a specific point in
time
– a pageview consists of several items
• frames, text, graphics, and scripts that construct a single
Web page
• Clickstream - a sequential series of pageview
requests made from a single user
17
Web Data Abstractions
(High Level)
• Abstractions concerning Visitors
• Establishes precise semantics for the concepts
–
–
–
–
–
–
–
Unique Visitor
Conversion Rate
Abandonment Rate
Attrition
Loyalty
Frequency
Recency
18
Data Abstractions
• Unique Visitor
– A unique visitor is counted when a human being uses
a web browser to visit a web site.
– A visitor may be “unique” for different periods of time.
– The individual is defined by a cookie in the visitor’s
web browser
19
Data Abstractions
• Conversion Rate
– A conversion rate is the number of “completers”
divided by the number of “starters” for any online
activity that is more than one logical step in length
– Starting and finishing any activity
• Purchase
• Download a research article
• Etc.
20
Data Abstractions
• Abandonment Rate
– The abandonment rate for any step in a multi-step
process is one minus the number of units that make it
to “step n+1” divided by those at “step n”
– The formula is (1 – ((n+1)/n)
– Consider a 10 step process to acquire a resource
• How any quit after step 1 or 2 or 3 or 4 or …
– Consider a 5 step process to acquire a resource
• How any quit after step 1 or 2 or 3 or 4 or …
21
Data Abstractions
• Attrition
– Attrition is a measurement of people you have been
able to successfully convert but are unable to retain to
convert again
– Consider e-bay web site vs. web site for technical
information
22
Data Abstractions
• Loyalty
– Loyalty is a measure of the number of visits any
visitor is likely to make over their lifetime as a visitor
– Reported as number of visits per visitor
• 100 visitors made 3 visits each, 87 visitors made 4, etc.
• Avoid double counting (i.e. do not count the 87 in with the
100)
23
Data Abstractions
• Frequency
– Frequency is a measure of the activity a visitor
generates on a web site in terms of time between
visits
– Measured in terms of “days between visits”
24
Data Abstractions
• Recency
– Recency is the number of days since the last visit (or
purchase)
– Reported as the number of visitors who returned after
“n” days.
25
Pyramid Model of Web Analytics Data
Uniquely Identified Visitors
Unique Visitors
Visits
Page Views
Hits
Volume of Available Data
26
Web Usage Mining
• Web usage mining is to apply statistical and data
mining techniques to the processed server log
data, in order to discover useful patterns
• Data mining methods and algorithms that have
been adapted for the Web domain
–
–
–
–
Association rules
Sequential pattern discovery
Clustering
Classification
27
Web Usage Data Mining
• After discovering patterns from usage data, a
further analysis has to be conducted.
• Common ways of analyzing such patterns
– Using a query mechanism on a database where the
results are stored
– Loading the results into a data cube and then
performing OLAP operations
– Visualization techniques are used for an easier
interpretation of the results
• Using these results in association with content
and structure information concerning the Web
site there can be extracted useful knowledge for
modifying the site according to the correlation
between user and content groups.
28
Web Analytics:
Tools and Case Studies
• Tools
– VisiStat - www.visistat.com
• Web Analytics Case Studies
–
–
–
–
–
–
–
–
Communications Provider - TuVox.com
Online Retailer - TicketsByInternet.com
Winery & Entertainment Venue - The Mountain Winery
Non-Profit Organization - SFBallet.org
Public Relations & Media Agency - BLASTmedia
Technology Provider for Real Estate Professionals - Pullan.com
Real Estate Agency - Intero Real Estate
Start-Up Online Business - GuruPrint.com
29