Profit Mining: From Patterns to Action

Download Report

Transcript Profit Mining: From Patterns to Action

Profit Mining:
From Patterns to Action
Ke Wang, Senqiang Zhou, Jiawei Han
Simon Fraser University
1
Why Profit Mining?


A major obstacle in data mining application
is the gap between:
–
statistic-based pattern extraction and
–
value-based decision making
Profit mining:
–
value-based data mining
2
An Example

Suppose we want to maximize profit. Association
rules [AIS93]
{Perfume}->Lipstick (more often)
{Perfume}->Diamond (more profit)
do not suggest which items (and prices) to
recommend to a customer who bought Perfume.

Similar problems with correlation, classification, etc.
3
The Problem


Given: several transactions of form:
–
{<I,P,Q>,…, <I,P,Q> | <I,P,Q>}, for Item,
Promotion code, and Quantity. | separates nontarget items and target items.
–
{<FlakedChick., $3,2> | <Sunchip,$1,1>}
Recommend target <I,P> to customers who buy
non-target items, to maximize profit.
4
Not Prediction Problem

An example:
–
100 customers each bought 1 pack for $1/pack.
Profit=100(1-0.5)=$50.
–
100 customers each bought 4 packs for $3.2/4-pack.
Profit=100(3.2-2)=$120.

Prediction repeats the history.

Profit mining gets smarter from the history, by

recommending “right items” and “right prices”.
5
Challenge I - notion of profit

Pure statistic approach favors
–

Pure profit approach favors
–

{Perfume}-> Lipstick
{Perfume}-> Diamond.
Profit mining considers:
–
both statistical significance and profit significance.
6
Challenge II - customer intention

Mining On Availability (MOA):
–
Paying a higher price implies the willingness to
pay a lower price.

{<FC,$3>} -> <Sunchip,$1> can be extracted
from transaction {<FC,$5> | <Sunchip,$1.5>}

Recognizing this behavior brings new sales
opportunities (at lower price).
7
Challenge III - search space

Thousands of items, and much more sales.
Any combination can trigger a
recommendation.

Search at alternative concepts (food, meat,
etc) and prices makes it worse.
8
Step 1: generating rules

Association rules
–

{Diaper -> Beer}, supp=10%, conf=80%
Recommendation rules:
–
{g1,…,gk} -> <I,P>, where gi is <Item,Price>, or
Item, or Concept.
–
{<FlakedChick. , $3.8>} -> <Sunchip,$4.5>
–
{FlakedChick.} -> <Sunchip,$4.5>
–
{Meat} -> <Sunchip,$4.5>
9
Handle alternative concept and
prices
10
Step 2: building the model

We rank rules by the “average profit” made by
the recommendation of a rule.
–
–

{<FC,$3.5>} -> <Sunchip,$1> matches

t1: {<FC,$4.0>| <Sunchip,$2>} (a hit)

t2: {<FC,$4.5>|<Milk,$3.5>} ( a miss)
If the cost of Sunchip is $0.7, the average profit is
$0.15.
To recommend, we select the matching rule of
the highest possible rank.
11
Step 3: Pruning the model

The model favors “high average profit”
rules.

Such rules may bring a large profit.

Such rules may be random noise.

Cannot prune them simply based on
statistical frequency.
12
Pruning the model

We prune rules to increase the estimated
profit on the whole population.

We organize rules into specificity tree: the
parent is the highest ranked general rule of a
child.

We cut off the tree to maximize the estimated
profit.
13
14
Evaluation

Synthetic datasets: IBM synthetic data generator,
modified to have price and cost.

1000 items and 1000K transactions

For non-target item i:

–
cost(i)=c/i
–
price j=(1+j*10%)cost(i), j=1,2,3,4.
For target items:
–
Dataset I has 2 target items
–
Dataset II has 10 target items
15
Profit Gain on Dataset I
16
Hit Ratio on Dataset I
17
Hit Ratio on Dataset I
18
Profit Gain on Dataset II
19
Hit Ratio on Dataset II
20
Hit Ratio on Dataset II
21
Conclusion

Proposed a new direction of data mining:
Mining for profit.

Directly factor in business goal into data
mining

Related work: microeconomic view of data
mining [KPR98]
22