Web Usage mining for E-Business Analytics
Download
Report
Transcript Web Usage mining for E-Business Analytics
E-Metrics and E-Business
Analytics
Bamshad Mobasher
DePaul University
Web Usage Mining &
E-Business Analytics
The primary goal of e-business analytics is to understand and be
able to predict the behavior of online customers
Examples of questions we want to answer using the data
Where did visitors come from?
What do they do when they get to the site?
How happy are the visitors/customers?
What are the outcomes: conversions, repeat visits, loyalty?
What types of content attracts which types of customers?
Which customers are profitable?
How profitable are different products or product categories?
Where do data-driven answers to these question come from?
E-metrics – metrics/statistics that tell us something about online behavior of the
user on the site
Data mining – finding deeper patterns in the data and building models
2
Web Usage Mining &
E-Business Analytics
Different Levels of Analysis
Session Analysis
Static Aggregation and Statistics
OLAP
Data Mining
3
Session Analysis
Simplest form of analysis: examine individual or groups of
user sessions and/or e-commerce transactions
Advantages:
Gain insight into typical customer behaviors
Trace specific problems with the site
Drawbacks:
LOTS of data
Difficult to generalize
4
Static Aggregation (Reports)
Most common form of analysis (e.g., Google Analytics,
WebTrends, etc.)
Data aggregated by predetermined units such as days or
sessions
Generally gives most “bang for the buck.”
Advantages:
Gives quick overview of how a site is being used.
Minimal disk space or processing power required.
Drawbacks:
No ability to “dig deeper” into the data.
Page
View
Home Page
Catalog Ordering
Shopping Cart
Number of
Sessions
50,000
500
9000
Average View Count
per Session
1.5
1.1
2.3
5
Static Aggregation (Reports)
Typical tools:
Google Analytics
Urchin
WebTrends
6
Online Analytical Processing (OLAP)
Allows changes to aggregation level for multiple dimensions
Generally associated with a Data Warehouse
Advantages & Drawbacks
Very flexible
Requires significantly more resources than static reporting.
Page
View
Kid's Stuff Products
Number of
Sessions
2,000
Average View Count
per Session
5.9
Page
Number of
View
Sessions
Kid's Stuff Products
Electronics
Educational
63
Radio-Controlled
93
Average View Count
per Session
2.3
2.5
7
Data Mining: Going deeper
Prediction of next event
Discovery of associated events
or application objects
Sequence
mining
Markov
chains
Association
rules
Discovery of visitor groups with
common properties and
interests
Clustering
Discovery of visitor groups with
common behaviour
Session
Clustering
Characterization of visitors with
respect to a set of predefined
classes
Classification
Anomaly/attack detection
How Data Mining is Used - Examples
Calibration of a Web server:
Prediction of the next page invocation over a group of concurrent Web
users under certain constraints
Sequence mining, Markov chains
Prefetching resources that are likely to be accessed next
Cross-selling of products:
Mapping of Web pages/objects to products
Discovery of associated products
Association rules, Sequence Mining
Placement of associated products on the same page
Determining which items or product to feature on specific pages
9
How Data Mining is Used - Examples
Sophisticated cross-selling and up-selling of products:
Mapping of pages/objects to products of different price groups
Identification of Customer Groups or Segments
Clustering, Classification
Discovery of associated products of the same/different price
categories
Association rules, Sequence Mining
Formulation of recommendations to the end-user
Suggestions on associated products
Suggestions based on the preferences of similar users
10
E-Metrics
Collection of aggregate statistics and metrics necessary to
Understand visitor/customer behavior
Understand how visitors are using the site
Measure e-business outcomes such as conversion, loyalty, etc.
Monitor factors that prevent successful outcomes
Basic Types of E-Metrics (not necessarily mutually exclusive)
Site e-metrics – metrics that tell us something about how the site as a whole or
specific components (pages, categories, tools, functions) are being used and
how to improve the site or its content
Customer e-metrics – metrics that characterize the behavior of visitor or
visitor segments and measure the propensity of visitors convert
Basic business metrics – general metrics to measure how successfully overall
business objectives are being met (revenue, profitability, etc.).
11
E-Metrics Commonly Used by Industry
Number of customers
100%
95%
Visits resulting in purchase
Average order value
91%
Number of registered users
88%
Origin of visitors
86%
Customer service response time
79%
Purchases over the last six months
79%
Number of repeat visitors
74%
Revenue for repeat visitors
63%
Origin of repeat visitors
63%
New and repeat conversion rates
Customers in a loyalty program
60%
47%
12
Basic Site Metrics
• Which site “referred”
them
–
–
–
–
–
Search engine
Affiliate site
Partner
Advertisement
Contribution to sales or
other desired outcome
• Measures - allows the
evaluation of the
referrer
– What percentage of all
referrals came from this
source?
– Calculation of the cost of
acquisition of each
visitor
13
Basic Site Metrics
• We can monitor
– Which content is
accessed by users
– When they visit
– How long they stay
– Whether interaction with
content leads to sales or
other desired outcome
• Measures – eg.
– Bounce rate: proportion
of visitors to a page who
leave immediately
– Stickiness: how long a
visitor stays on the site,
and how many repeat
visits they make
– Conversion rate: % of
visitors who perform a
desired action
14
Key Measures Needed to Compute
Aggregate Site E-Metrics
Measure
Measure
Definition
How many users?
(audience reach)
Unique users
IP+User-agent
Cookie and/or
Registration
How often? (frequency and
recency metrics)
Visit (user session)
A series of one or more
page impressions served to
one user (gap of
30minutes=end of visit)
How many views? (volume
metric)
Page impression
File (or files) sent to a user
as a result of a server
request by that user
How many Ad views?
Ad impressions
A file (or files) sent to a user
as an individual ad as a
result of a server request by
that user
What do they do?
Ad clicks?
An ad impression clicked on
by a valid user
15
More on Basic Site Metrics
Stickiness
measures site effectiveness in retaining visitors within a specified time period
related to duration and frequency of visit
Stickiness = Frequency x Duration x Total Site Reach
where
Frequency = (Visits in time period T) / (Unique users who visited in T)
Duration = (Total View Time) / (Unique users who visited in T)
Total Site Reach = (Unique users who visited in T) / (Total Unique Users)
This simplifies to:
Stickiness = (Total View Time) / (Total Unique Users)
16
More on Basic Site Metrics
Slipperiness
inverse of stickiness
used for portions of the site in which it low stickiness in desired (e.g., customer
service or online support)
Focus
measures visit behavior within specific sections of the site
Focus = (Avg. no. of pages visited in section S) / (Total no. of pages in S)
High Stickiness
Narrow Focus
Wide Focus
Low Stickiness
Either consuming interest on the
part of users, or users are stuck.
Further investigation required.
Either quick satisfaction or
perhaps disinterest in this section.
Further investigation required.
Enjoyable browsing indicates a
site ”magnet area”.
Attempting to locate the correct
information.
17
Shopping Pipeline Analysis
‘sticky’
states
Browse
catalog
Complete
purchase
Enter
store
Select
items
cross-sell
promotions
Overall goal:
•Maximize probability
of reaching final state
•Maximize expected
sales from each visit
‘slippery’
state, i.e.
1-click buy
up-sell
promotions
Shopping pipeline modeled as state transition diagram
Sensitivity analysis of state transition probabilities
Promotion opportunities identified
E-metrics and ROI used to measure effectiveness
18
Metrics for E-Customer Life Cycle
Describe the milestones at which we:
target new visitors
acquire new visitors
convert them into registered/paying users
keep them as customers
create loyalty
Loyalty
19
Elements of E-Customer Life Cycle
Reach
targeting new potential visitors
can be measured as a percentage of the total market or based on other measures
of new unique users visiting the site
Acquisition
transformation of targeting to active interaction with the site
e.g., how many new users sessions have a referrer with a banner ad?
e.g., what percentage of targeted audience base is visiting the site?
Conversion
a conversion rate is the ratio of “completers” to total “starters” for any
predetermined activity that is more than one logical step in length
examples: percentage of site visitors who perform a particular action such as
registering for a newsletter, subscribing to an RSS feed, or making a purchase
We can get more fine-grained measures: micro-conversion rates
look-to-click rate; click-to-basket rate; basket-to-buy rate
20
Elements of E-Customer Life Cycle
Retention
difficult to measure and metrics may need to be time/domain dependent
usually measured in terms of visit/purchase frequency within a given time
period and in a given product/content category
time-based thresholds may need to be used to distinguish between retained
users and deactivated-reactivated users
Loyalty
loyalty is indicated by more than purchase/visit frequency; it also indicates
loyalty to the site or company as a whole
special referral or “bonus” campaigns may be used to determine loyal
customers who refer products or the site to others
in the absence of other information, combinations of measures such as
frequency, recency, and monetary value could be used to distinguish loyal
users/customers
21
Elements of E-Customer Life Cycle
Interruptions in the Life Cycle
Abandonment
measures the degree to which users may abandon partial transactions (e.g.,
shopping cart abandonment, etc.)
the goal is to measure the abandonment of the conversion process
micro-conversion ratios are useful in measuring this type of event
Attrition
applies to users/customers that have already been converted
usually measures the % of converted users who have ceased/reduced their
activity within the site in a given period of time
Churn
is measured based on attrition rates within a given time period (ratio of
attritions to total number of customers
goal is to measure “roll-overs’ in the customer life cycle (e.g., percentage
loss/gain in subscribed users in a month, etc.)
22
Basic E-Customer Life cycle Metrics
W (Target Market)
NS
S (Site Visitors)
Note:
Each of W, S, P, C
and CR must be
defined based on site
characteristics and
business objectives.
P (Prospects / Active
NP
Investigators)
NC
C (Customers)
CB (Abandon
Cart)
C1
CA
(one-time Customers) (Attrited Customers)
CR
(Repeat Customers)
23
Micro-Conversion Rates
M1 (saw product impression)
NM1 NC
M2 (performed product click through)
NM2 NC
M3 (placed product in shopping cart)
NM3 NC
24
Micro-Conversion Rates
P
NP NC
M1 (saw product impression)
NM1 NC
M2 (performed product click through)
NM2 NC
M3 (placed product in shopping cart)
NM3 NC
M4 = C (made purchase)
25
Basic E-Customer Metrics - RFM
RFM (Recency, Frequency, Monetary Value)
each user/customer can be scored along 3 dimensions, each providing unique
insights into that customers behavior
Recency - inverse of the time duration in which the user has been inactive
Frequency - the ratio of visit/purchase frequency to specific time duration
Monetary Value - total $ amount of purchases (or profitability) within a given time period
Monetary Value
5 4 3 2 1
1 2 3 4 5
Frequency
26
Building The Customer Signature
Building a customer signature is a significant effort, but well worth
the effort
A signature summarizes customer or visitor behavior across
hundreds of attributes, many which are specific to the site
Once a signature is built, it can be used to answer many questions
The mining algorithms will pick the most important attributes for
each question
Example attributes computed:
Total Visits and Sales
Revenue by Product Family
Revenue by Month
Customer State and Country
Recency, Frequency, Monetary (RFM)
Latitude/Longitude from the Customer’s Postal Code
27
E-Metrics and E-Business
Analytics
Bamshad Mobasher
DePaul University