On Mining Group Patterns of Mobile Users

Download Report

Transcript On Mining Group Patterns of Mobile Users

Mining Group Patterns of
Mobile Users
黃三益
國立中山大學資管系
Outline






Motivation
Problem definition
Algorithms for mining mobile groups
Handling untraceable time intervals
Evaluation
Conclusions
Introduction

Facts

CRM is quite successful in online stores




Why?



Real-time recommendation
Faster checkouts
Price/features comparisons
e-CRM is able to collect detailed customer data.
e-CRM is able to identify customers
But there are limitations with e-CRM.
Searching vs Purchasing

Offline
Conversion
92%
Vast majority of search-to-purchase
conversion occurs offline
Latent Online
Conversion
7%
Same Session
Online
Conversion
1%
Where Did the Business Go?
One Month
Period
4% of Client’s
Shoppers were also
Buyers
Client Shoppers
10 Million+
69% of Client’s
VNBs
Avg $460 per Buyer
=
shopped at least one
96% of Client’s
competitive
~
$300MM in Lost
Sales merchant
Shoppers
became
Visitor Non-Buyers
(VNBs)
Annual revenue potential from
capturing 10% of these buyers’
category sales approaches
$36 million
8% of Client’s
Visitor/Non-Buyers
purchased at
a competitive
merchant
(652,200 buyers)
Connecting Attitudes and Behavior:
Why Didn’t VNBs Purchase?
Which of the following reasons describe why
you did not make a purchase (from major retailer “A”)?
Respondent base: Known Visitor/Non-buyers of retailer “A” based upon observed behavior
I could not find the product I was looking for
22%
Product price was too high
14%
I could not find the brand that I wanted
2%
Unsure how easy to return
2%
15%
15%
I do not like to input my credit card info
3%
Product information incomplete
3%
Product was out of stock
Shipping and handling costs too high
Did not want to wait to have item shipped
Not a large variety
18%
13%
13%
7%
4%
5%
3%
11%
11%
10%
10%
% VNB Respondents
Primary Reason
All Reasons
38%
Catalogs Make a Difference
Think Direct Mail is irrelevant in the Internet age? Think again.
As experts in the direct mail business, we teamed up with
comScore and conducted a Catalog Study to track the
purchasing habits of consumers who receive catalogs vs.
those who do not. Our research shows that…
• When you mail consumers a catalog, they:




could be TWICE as likely to make an online purchase
could spend more meaningful time at your web site, including a
75% likelihood to enter a secure session
could be even more likely to purchase online with more
frequent catalogs
What this all adds up to is a
revenue lift of $21.1 million
per million unique visitors
Can physical stores gain the
advantages of e-CRM without its
limitations?
We need to
•identify customers
•collect data about their buying behavior.
Introduction

Trends

The cost of location tracking/positioning
technologies have been dramatically reduced and
the use of mobile devices is increasingly popular




Wi-Fi/Cell id/sensor network
GPS
RFID
Identification and browsing/transaction behavior
of customers in physical stores is becoming a
reality in the near future.
Introduction

What data can you get by tracking
customers in a physical store?





Customer identification
Products placed in a shopping cart
Customer aisle behavior
Purchase profile of the customer
Reviews and ratings
Introduction

What can you do by tracking customers
in a physical store?


Requires
tags on
products




Intelligent in-store customer service
Fast checkout
Price and feature comparison
Physical location and map
Possible in
Price optimization
the near future
Identification of shopping pals.
Introduction—Mining Mobile
Groups

Many ways can determine the groups a user belongs to.



Grouping based on demographics
Grouping based on purchasing behavior
Groups formed by using spatial-temporal information are useful.


Objects within a mobile group tend to closely influence one another.
Potential applications:




Construction of social network
Animal behavior study
group-based pricing models or marketing strategies
This work gives a precise definition about mobile groups and
derives algorithms for efficiently identifying mobile groups.


Physical proximity between group members.
Temporal proximity between group members.
Work done so far




Y. Wang, EP. Lim, SY Hwang, “On Mining Group
Patterns of Mobile Users,” DEXA2003. Y. Wang, EP.
Lim, SY Hwang, “Efficient Group Pattern Mining Using
Data Summarization,” DASFAA2004, Korea
An extension of the two previous work will appear in
DKE journal.
SY. Hwang, YH. Liu, JK. Chiu, EP. Lim “Mining Mobile
Group Patterns: A Trajectory-based Approach,”
PAKDD2005.
Y. Wang, EP. Lim, SY Hwang, “Efficiently Mining
Maximal Valid Groups,” to appear in VLDB journal
Problem Definition
D: user movement database
M distinct users u1, u2 , u3 ,..., uM
M
D  i 1 Di
a time series containing triplets (t,x,y)
the interval between every t and t + 1 is fixed
Di is
Problem Definition
Problem Definition

Definition. Given a group of users G, a
maximum distance threshold max_dis, and a
minimal time duration threshold min_dur, a set
of consecutive time points [t,t+k] is called a
valid segment of G if
1.All users in G are not more than max_dis apart at time
t, t+1,…, and t+k;
2.Some users in G are more than max_dis apart at time
t-1:
3.Some users in G are more than max_dis apart at time
t+(k+1);
4.(k+1)>=min_dur
Problem Definition
max_dis=10, min_dur=3;
Problem Definition

Definition. Let P be a mobile group with valid
segments s1,…,sn, and N denotes the number
of time points in the database, the weight of P
is defined as:

weight ( P ) 
n
s
i 1 i
N
Problem Definition



If the weight of a mobile group exceeds a
threshold min_wei, we call it a valid group.
For example, if min_wei =50%, the mobile
group P={u2,u3,u4} is valid, since it has valid
segments{[1,3][6,8]} and weight 6/10>0.5.
The valid mobile group mining problem: Given
D, max_dis, min_dur, and min_wei, find all
valid groups.
AGP: Algorithm based on
Apriori Property

[The Apriori property of valid mobile
groups]: Given D, max_dis, min_dur,
and min_wei, if a mobile group is valid,
then any of its subset is valid.
AGP: Algorithm based on
Apriori Property
VG-Growth: Algorithm based on Valid
Group Graph Data Structures

Bottleneck of AGP



Candidate generation
Involving many iterations of database scanning
Definition. A valid group graph (or VG-graph) is a
directed graph (V,E), where



V is a set of vertices representing users in the set of
valid 2-groups.
E is a set of edges representing the set of valid 2groups.
Each edge is also associated with the valid segments
of the corresponding valid 2-group pattern.
VG-Growth: Algorithm based on Valid
Group Graph Data Structures
max_dis = 10, min_dur = 3 and min_wei = 60%
VG-Growth: Algorithm based on Valid
Group Graph Data Structures
u4
Evaluation



Movement database generated by IBM
City Simulator
Covering 1500m1000m 1000m area
of 48 roads and 72 buildings
Time unit is 10 minutes. M1kN1k means
1000 users moving for 1000 time units.
Evaluation
Evaluation
Data Summarization

The bottleneck with VG-Growth is on
the computation of 2-groups.


Distances computed: N (M2)
To reduce the time for identifying valid
2 groups, the location data of each user
is summarized.

Locations of a user within a time window w
is summarized as an instance of some
summarized model (SM)
Summarized models


Sphere
cuboid
Summarized Location Sphere


The resultant location database is called a
summarized database, where the number
of time points becomes N/w.
Basic ideas


Find a set of candidate user pairs based on
summarized database
Scan the summarized database and original
database (if necessary) to determine valid 2groups.
Finding candidate 2-groups
from summarized DB




Two users ui and uj are possibly close at a
time point t’ in the summarized database if
A maximal set of consecutive time points [t’a,
t’b] is called possibly close segment
For each pair of users ui and uj, we compute
the upper bound of its weight.
A pair of users is a candidate if the upper
bound of its weight is no less than min_wei.
CASE 1
CASE 2
CASE 3
Summarization models

Sphere location summarization method
(SLS)

Each sphere is represented as (pc, r)
Summarization models

Cuboid location summarization method (CLS)

Each cuboid is represented as two 3-D points
(vmin, vmax)
Vmin.x
Vmax.x
V’min.x
V’max.x
Summarization models
Performance Evaluation
Pitfalls of the location model



To maintain accurate location tracking, the
frequency of sampling users’ locations must
be high. (Tracking 1000 users every second
will result in 1GB per day)
In reality, moving objects may be
disconnected from time to time voluntarily or
involuntarily.
It is almost impossible to have perfectly
synchronized sampling of users’ locations in
reality.
Remedies



Use trajectories with untraceable
periods to model user locations
The mobile group mining problem has
to be modified.
The algorithms have to be modified.
Trajectory model

A trajectory T is a set of piecewise
linear functions, each of which maps
from a disjoint time interval to an ndimensional space.

E.g.
[( x  2t  40)  ( y  t  23)  (0  t  21)]
 [( x  2)  ( y  t  23)  (21  t  22)]
 [( x  0.5t  9)  ( y  1)  (22  t  30)]
Trajectory-based location DB
reference_point
o1
o2
o3
velocity
start_time
end_time
(1,1)
(3,1)
0
3
(7,-11)
(1,5)
3
5
(10,-3)
(4,3)
6
9
(2,2)
(2,1)
0
3
(2,-13)
(2,6)
3
5
(-4,5)
(3,2)
6
10
(2,4)
(3,1)
0
3
(17.-5)
(-2,4)
3
5
(12,35)
(-1,-4)
5
8
How to convert location data
into trajectories


Classified either batch or online algorithms
[N. Meratnia and R.A. de By, ETDB2004 ]
 Top-Down:Douglas-Peucker( batch
algorithm)
 Open window:from one window then is
growing its size. (Online algorithm)
How to convert location data
into trajectories

Top-Down
How to convert location data
into trajectories

Open Window
Determining the distance of 2
objects

For trajectories Object o :
of two objects o1 [( x[(x (3,(11),5t )t (1,(17)),11(0))t(3 3)]t  5)]
 [( x  untraceable)  (5  t  6)]
and o2
 [( x  (4,3)t  (10,3))  (6  t  9)]
1


Synchronize
linear pieces
Calculate the
distance for
each time
segment
 [( x  untraceable)  (9  t  10)]
Object o2:
[( x  (2,1)t  (2,2))  (0  t  3)]
 [( x  (2,6)t  (2,13))  (3  t  5)]
 [( x  untraceable)  (5  t  6)]
 [( x  (3,2)t  (4,5))  (6  t  9)]
 [( x  (3,2)t  (4,5))  (9  t  10)]
Determining the distance of 2
objects



Location of object o1 at time t: (1 + 3t , 1 + t)
Location of object o2 at time t: (2 + 2t, 2 + t)
Enclidean distance of o1 and o2 when 0t<3:
(1  t ) 2  (1) 2
distanceo1 ,o2 (t )  (1  t )  1 ,0  t  3
2
2
(5  t )  (2  t ) ,3  t  5
undecided ,5  t  6
2
2
(14  t )  (8  t ) ,6  t  9
undecided ,9  t  10
2
2
Determining close intervals


Given a distance function dist(t) of two
objects o1 and o2 within an interval I,
we would like to identify the
subintervals I’ in I such that
dist(t)max_dis, tI’.
E.g.

Let 3=max_dis=

32
32
,1 
[1 
2
2
(1  t ) 2  12
][0, 3)= [0, 3)
Determining the weight


Accordingly, the far segments and
undecided segments can be determined.
The weight of a mobile group is defined
based on the lengths of its close, far,
and undecided segments.
Definitions

For a user group P



Geographically close, far, or undecided at a
time point t.
The valid close segments and valid far
segments of P can be accordingly defined.
The weight of P is defined as
Lclose  Lundecided 
weight( P) 
Lclose
Lclose  L far
Lclose  L far  Lundecided
Lclose

Lclose  L far
The problem

The problem is to find all valid mobile
groups under such a model

Apriori property still holds

if a moble group is valid, all of its subgroup will
also be valid.
weight( P) 
Lclose
Lclose  Lundecided 
Lclose  L far
Lclose  L far  Lundecided

Lclose
Lclose  L far
Apriori Trajectory-based Group
Pattern Mining
Trajectory VG-Growth


It behaves the same as VG-Growth except
that each edge in TVG graph is associated
with far segments and close segments.
The close and far segments of a conditional
TVG graph have to be properly updated.


c(o1 , o2 | o3) = c(o1 , o2)∩c(o1 , o3)∩c(o2 , o3)
f(o1 , o2 | o3) = f(o1 , o2)∪f(o1 , o3)∪f(o2 , o3)
Performance evaluation

We compare the other two methods for
handling untraceable intervals



Pessimistic
Linear
Performance metrics
precision ( DBi ) 
recall ( DBi ) 
the num of valid groups identified from both DB and DBi
the num of valid group identified from DBi
the num of valid groups identified from both DB and DBi
the num of valid group identified from DB
Performance evaluation



Precision of pessimistic method is
always 1.
Recall of pessimistic method is low
Proportional method has comparable
Precision with and higher recall than
linear method.
Ongoing/Future work

Calendar-based mobile group mining

Considering calendar patterns in mining mobile groups, e.g.,



Find mobile groups that are valid on every Friday or
Wednesday
We need a calendar schema, e.g. (year, month, day), a set
of calendar patterns, e.g., (*,12, 1)
Given a set of calendar patterns C based on a common
calendar schema, a set of timestamped objects’ movement
data D, and the user-specified thresholds max_dis, min_dur,
min_wei, and min_match_ratio, the calendar-based mobile
group mining problem is to find out all mobile group and
calendar pattern pairs (g, c) such that g is valid with c,
where cC.
Ongoing/Future work

Employing data correction techniques
before mining mobile groups.
Kalman Filter
Kalman Filter
Correction
Prediction
Project the state ahead
~
pk  A~
pk 1
Project the error covariance ahead
Pk  APk 1 AT  Q
Compute the Kalman gain

k


k
Kk  P H HP H  R
T
T

1
Update estimate with measurement zk

~
pk  ~
pk  Kk mk  H~
pk
Update the error covariance
Pk  1 Kk H Pk

Ongoing/Future work




Maximal/closed mobile group mining
Guidelines for parameter settings
Mobile group clustering
Other p-CRM applications and
technologies