Chapter 1 - Zhangxi Lin`s

Download Report

Transcript Chapter 1 - Zhangxi Lin`s

Introduction to Online
Marketing Intelligence
Zhangxi Lin
ISQS 3358
Texas Tech University
1
Outline
Online Targeted Advertising
About Web mining
Data
Knowing your customer
Consumer segmentation
2
Online Targeted Advertising
3
Marketing Technology Adoption
 In December 2005, Forrester surveyed 371
marketing technology decision-makers and
influencers to investigate trends in marketing
technology adoption and spending.
 Respondents hail from six major industry
groups, and two-thirds work for firms whose
annual revenues in 2005 exceeded $1 billion.
Marketing technology adoption is widespread.
Marketers say they need a more comprehensive
application suite.
Vendors aren’t delivering yet.
4
Marketing Technology Spending
 Since 2003, budgets have crept steadily upward
and, on average, 2006 budgets are up 7% over
2005. But spending varies significantly by
company size and industry. Specifically:
The largest and smallest firms are scaling back slightly.
Technology followers are putting cash behind their
intentions.
As a percentage of revenue, retailers spend the most on
marketing technology.
B2B firms are growing marketing technology spend
aggressively.
5
Marketing Technology Spending
6
Online Marketing Technology
7
Online Advertising Market Status
 In 2006, the advertising spending was $16.8 billion an
increase of 34% from that of 2005 (IAB 2007).
 According to DoubleClick (2005)
 Limited online advertising publishing resources because of
limited online users’ capability to view growing number of web
pages (DoubleClick Research 2005)
 Online targeted advertising is a seller market
 Online targeted advertising is emerging as a new trend.
 In March 2007, China’s largest advertising company by
advertising revenue, Focus Holding Ltd agreed to buy Chinese
leading online firm Allyes Information Technology Co. Ltd for
$225 million.
 In April 2007, Google Inc. announced a definitive agreement to
acquire DoubleClick for $3.1 billion.
8
Targeted Marketing
 Users know what they want
 Users purchased certain items from certain websites
 We can apply real-time customized marketing solutions (see the
process map later)
 Users did not purchase, but click through some links
 Mining the click streams of the customers, and figure out the
needs----behavioral targeting
 Users do not know what they want---behavioral targeting
 Collecting information online (such as the blogs, discussions
boards in a community)
 Segment/target/position strategy
 We can potentially build a database profiling the online users
 How to design (create) ads to make it appeal to end
users
9
Implications of Targeted Marketing
For advertisers
Help to drive immediate responses (or
increased sales) to their advertisements
Help to build branding for the advertisers
For publishers
Maximize the value of high-quality ad inventory
space (differential services for different site
sectors)
10
Effectiveness of Online Marketing
When executed properly, behavioral marketing is a highly
effective means of reaching and converting your target audience.
Network Behavioral Targeting
vs. Non-Targeted Advertising
Behavioral Re-Targeting vs.
Non-Targeting Advertising
Lift in CVR
Lift in CVR
Advertiser A
90%
Advertiser A
167%
Advertiser B
323%
Advertiser B
2,232%
Advertiser C
105%
Advertiser C
3,130%
Source: Advertising.com, 2005
Source: Advertising.com, 2004
11
PRODUCT PURCHASE
This travel advertiser targeted consumers who previously
visited its website in order to drive actual reservations.
Visitors who
had not booked
a reservation
received custom
ads highlighting
guaranteed rates,
seasonal discounts,
new hotel perks
and free gifts
with an online
booking.
Campaign
Results
Behavioral
Targeting
Impressions
99 million
Clicks
92,223
Bookings
52,936
Conversion
Rate
57.4%
A hotel booking
was generated for
every 2,000
impressions served.
1 out of every
2 people who
clicked on the ad
completed a booking.
12
About Web Mining
13
Web Mining
 When online users browse web pages, their
activities could be recorded. Using data mining
techniques to analyze these activities will enable
more accurate web-based online advertising。
 The possible web mining applications may
include
Consumer Profiling
Purchase propensity analysis
Web page effectiveness evaluation
Online recommendation
Realtime advertising
Others
14
Some Business Questions
Who is visiting my Web site?
Who is buying my product(s)?
Who are my repeat buyers?
Which customers are churning?
Which Web design produces the most
purchases?
What campaign strategies are most effective in
increasing Web site visits?
15
Business Questions
 What factors influence product purchases?
•
•
•
Time-of-day effects
Gender, Age, Income, and so forth
Latent factors: e-shopper, Web expert, and so forth
 Which sales channels produce the most profitable customers?
 Do any site-visit patterns correlate with outcomes that can be
exploited for business advantage?
 How can I forecast peak usage and future usage to ensure I have
the hardware and technology to keep my Web site running?
 How can I monitor my Web site to prevent inappropriate access and
malicious activity?
 How can I manage purchases, returns, and exchanges to avoid
fraud and reduce waste?
16
Web Mining for Profitability
Increase viewing, navigation, and transaction
efficiency.
Improve the customer experience.
Add services and features that promote crossselling and up-selling opportunities.
Identify problem areas.
Improve security.
Attract more high quality customers.
17
Customer Relationship Management
(CRM)
Making the right offer to the right customer at the
right time.
One-to-one marketing. — Peppers and Rogers
TQM (Total Quality Management) with new buzz
words.
“The practice of annoying customers for short
term profits.” — Herb Edelstein
18
Examples of Web Site Services
Recommender systems
Stock quotes or financial services
News, weather, sports, traffic conditions
Celebrity or event photos and multimedia
Search engines
Web site hosting or e-mail
Games or contests
Beach cams, space cams, hot spot cams
19
Internet Commerce Challenges
24/7 operations
International scope
Non-standard media




Many browsers
Different display monitors and graphics adapters
Different window geometry
Different computers and operating systems
Different customer concerns
 Secure transactions
 Privacy and confidentiality
 Legitimacy
20
Data Collection and
Preparation
21
Data Collection Methods
Web logs
Cookies
Forms
Java applications
Other applications
22
Web Log Data
Fields
 User’s IP address, also called
 Remote host name
 Client IP address
 User name, also called
 Remote user log name (may be different)
 Authenticated user name
 Date and time of request, with or without a UTC offset
 Request type, also called “method”
 HTTP request with (CLF) or without (IIS) argument
 Status: HTTP three digit status code
 Number of bytes sent to client
continued...
23
Web Log Fields
 The URL path requested, if request type has no argument
 The port to which the request was served
 The name of the server
 The IP address of the server
 The time taken to serve the request
 Number of bytes in the request received from the client
 User agent, which is usually a text string with the name
and version number of Web browser used by the client
and the operating system of the client machine
 The domain name or IP address of the referring URL
 Query information in a text string
 Cookie information in a text string
24
The User Session
User requests index.htm.
Server sends copy of index.htm.
Browser
Browser parses index.htm,
finds references to image files,
and requests image files.
Web
Server
25
...
Three Popular Web Log Formats
NCSA Common Log Format
Microsoft IIS Format
W3C Extended Log File Format
26
Web Logs May Be Inadequate for
Data Mining
Limitations exist with respect to defining users,
sessions, and page hits.
User preferences must be inferred from limited
data: referring URL, page selections, browser.
Different users within a household may be
indistinguishable.
27
What Is a Cookie?
Browser requests Web page
Web page is delivered with
instructions for creating cookie
Browser creates
cookie and writes
to hard disk
Value of cookie sent to server
Custom content returned
Web Browser Client
Web Server
28
The Anatomy of a Cookie
Sequence of characters
Name
uniquely identifying cookie
Value Stored information
Domain Domain name
Path within a site. Access
Path
is restricted to this path.
Expires Expiration date in UTC
Secure Encryption flag
29
A Sample Cookie
session-id
103-0556164-3592039
www.megastock.com/
0
730710016
30123554
2742100288
29450847
*
30
Limitations of Cookies
Can only be accessed by the domain name that
created them (which is a GOOD thing)
Are restricted to a maximum number of cookies
per Web site (20 with Netscape Version 0)
Are limited in size (4K with Netscape Version 0)
Have an expiration date
31
Microsoft Internet Explorer Cookie
Options
32
Client-Side Cookies for
Personalization
Deployed using JavaScript or VBScript
Implemented through the document.cookie
property
Can be maintained using frames or the
document object model
33
Server-Side Cookies
Can be used to restrict access
Support shopping cart applications
Help track user activity on the Web site
34
Server-Side Data Collection
Maintaining user information
Collecting and updating information
e-Commerce strategies
35
Evaluating Visitor Behavior
36
Some Common Web Log Statistics
Most popular pages
Frequency of referring sites
Page count statistics: means, percentiles, variation
Session count statistics
Frequency of Web browser usage
Frequency of operating systems
Frequency of error types
Check web log statistics: http://www.commerx.com/usage/
This website is the business site of IMW
(http://www.inetworks.com) headquartered in Austin, Texas.
37
Baselines and Comparisons
Which statement is more informative?
Our Web server recorded 11,000 page views
yesterday.
Our Web server recorded an increase of 1000
page views yesterday compared to the previous
day.
Our Web server recorded a 10% increase in
page views yesterday compared to the previous
day.
continued...
38
Baselines and Comparisons: Good or
Bad?
“We converted 25% of our registered customers
to premium account status this month.”
“We converted 50% more of our registered
customers to premium account status
this month compared to last month.”
“Last month we converted 2 registered
customers to premium account status, and
this month we converted 3.”
39
Methods of Evaluating Visitor
Behavior
Web Stats
Path Analysis
Link Analysis
Stochastic Process Methods
Page transition probabilities
Probability of site abandonment
40
Path Analysis for an E-tailer
Final
Decision
Product
Selection
Product
Info
Custome
r
Info
Shipping
Billing/
Credit
Card
Info
41
A Visitor Path
Path: 1 6 7 1 3 8
1
5
1
4
2
6
3
3
2
7
5
4
8
6
EXIT
42
Path Analysis Example Results
Sixty percent of site visitors leave after viewing the
home page.
Seventy-three percent of customers who purchase
product X do not access the product X information page.
The highest probability of abandonment occurs on the
shipping page.
Sixty-three percent of consumers who purchased
product X viewed warranty information, while twentyseven percent of consumers who abandoned a
shopping cart containing product X viewed warranty
information.
43
Path Analysis E-tailer Example
Data
Only sessions with shopping carts are included
All paths up to “checking out” are condensed into
a single “Product Selection” state
Each session consists of 1 to 7 states, number of
items selected, value of all items in the shopping
cart, and time each state is entered.
Purpose: investigate the abandonment of
shopping carts and exiting the site without
making a purchase.
Analysis: group shopping carts by value,
perform a sequential association analysis,
and plot confidence as a function of state.
44
SAS Code for Path Analysis
ods html
path='C:\workshop\winsas\CCWEB'
body='rlnkstat.html';
title1 "Path Analysis of E-tailer Data";
proc contents data=crssamp.rlinks;
run;
Produce Contents of the RLINKS Dataset
continued...
45
SAS Code for Path Analysis
proc freq data=crssamp.rlinks;
tables Category
DollarCat
NumItems
PurchaseStep
/list missing;
run;
Produce Frequencies for Class Variables
continued...
46
SAS Code for Path Analysis
proc univariate data=rlinks;
var TotalCost;
run;
title2 "Total Cost when a Purchase is Made";
proc univariate data=rlinks
(where=(PurchaseSequence=7));
var TotalCost;
run;
Basic Descriptive Statistics
continued...
47
Link Analysis
Link analysis is the examination of the linkages
between effects in a complex system. (SAS
Help screen)
Analysts try to discover the relationships
between states in a complex system.
A link analysis may employ a variety of
techniques including OLAP, associations,
sequences, clustering, and graphics.
The path analysis performed on the e-tailer
data may be viewed as a link analysis
performed on a rather simple retail system.
48
49
SAS Link Node













C1U -- the unweighted first-order Centrality measure.
C2U -- the unweighted second-order Centrality measure.
C1 -- the first-order Centrality measure.
C2 -- the second-order Centrality measure.
VALUE -- the value of the class variable, or the midpoint of the
bin of the interval variable that constitutes the node.
VAR -- the variable that constitutes the node.
ROLE -- the variable role.
COUNT -- the node count. The number of observations that are
represented by the level of the variable.
PERCENT -- the node count divided by the total number of
observations.
ID -- the node ID.
TEXT -- the text variable, represented as VAR=VALUE.
X -- the X-coordinate of the node in the Link Graph.
Y -- the Y-coordinate of the node in the Link Graph.
50
C1 and C2
 The values C1 and C2 are measures of node importance.
 C1 is the first-order undirected centrality measure, which attempts
to measure the importance of the node in the network as a function
of how often it directly links to other nodes in the network.
 C2 is the second-order undirected centrality measure, which
attempts to measure the combined importance of all nodes that are
directly linked to the node.
 In a social network, C1 would measure “How many people (nodes)
are my friends?” C2 would measure, “How many people are friends
of my friends?”
 The centrality measures can be weighted or unweighted.
 A weighted first-order centrality measure would be analogous to
measuring, “How many people with many friends are my friends?”
Thus, a node with many direct links that is linked to the target node
would receive a higher weight than a node with few direct links.
51
The Web Stochastic Process
1
4
Home
Page
(Point of
Entry)
EXIT
3
5
2
States
52
Consumer Segmentation
53
Discussion – How to segment
 Dataset – Commrex web log dataset
 Two levels of granularity to aggregate the transaction
records
 Per session
 Per user
 Identify the interested pages and extract the information
to be mined
 Combining clustering and classification – How?
Referring to the case of INSSUBRO in Text Mining:
 Step 1: clustering
 Step 2: Using Data Set Attribute node to choose the target
variables and change status of other variables
 Step 3: Classification based on the target variable
54