International Journal of Electronic Commerce

Download Report

Transcript International Journal of Electronic Commerce

Application of Decision-Tree
Induction Techniques to
Personalized Advertisements on
Internet Storefronts
Paper By:
Jong Woo Kim, Byung Hun Lee, Michael J. Shaw, Hsin-Lu
Chang, Matthew Nelson
Source:
International Journal of Electronic Commerce / Spring 2001, VoL
5, No.3, pp. 45-62. Copyright © 2001 M.E. Sharpe, Inc. All
rights reserved.
Paper Authors:
 JONG WOO KIM is an assistant professor in the
department of statistics at Chungnam National
University in Teajon, Korea.
 BYUNG HUN LEE is a researcher in marketing at MPC
LTD. in Korea High-level timing goals.
 MICHAEL J. SHAW is director of the Center for
Information Systems and Technology Management at
the University of Illinois at UrbanaChampaign.
 HSIN-LU CHANG is a Ph.D. student in information
systems at the School of Commerce, University of
Illinois at Urbana-Champaign.
 MATTHEW NELSON currently is a second-year Ph.D.
student in information systems at the University of
Illinois.
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
2
Paper References:
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Agrawal, R.; Aming, A.; Bollinger, T.; Mehta, M.; Schafer, J.; and Srikant, R. The Quest data
mining system. In Proceedings of the 2nd International Conference on Knowledge Discovery in
Databases and Data Mining. Portland, OR, 1996, pp. 244-249.
Allen, c.; Kania, D.; and Yaeckel, B. Internet World Guide to One-to-One Web Marketing. New
York: John Wiley, 1998.
Berry, M.J .A., and Linoff, G. Data Mining Techniques for Marketing, Sales, and Customer
Support. New York: Wiley Computer Publishing, 1997.
BroadVision. A BroadVision one-ta-one white paper. White paper, BroadVision (1996).
BroadVision. http://broadvision.com (2000).
Business Week Graphic: On-line sales are soaring. Business Week (June 22, 1998) (based on
data from Forrester Research).
Chaturvedi, A.R; Hutchinson, G.K.; and Nazareth, D.L. Supporting complex real-time decision
making through machine learning. Decision Support Systems, 10, 2 (September 1993),213233.
CLIPS. CLIPS Riference Manual (Version 6.05). CLIPS, 1997.
Garbonell, J.G., and Michalski, R.S. Machine learning: A historical and methodological analysis.
Al Magazine (fall 1983), 69-79.
Gupta, 0.;, Digiovanni, M.; Norita, H.; and Goldberg, K. Jester 2.0: Evaluation of a new linear
time collaborative filtering algorithm. 22nd International ACM SIGIR Conference on Research
and Development in Information Retrieval (August 1999), pp. 291-292.
Johnson, R.A., and Wichern, D.W. Applied Multivariate Statistical Analysis. Englewood-Cliffs, NJ:
Prentice Hall International, 1992.
Kim, ]., Lee, K., Shaw, M. J., Chang, H., and Nelson, M. A preference scoring tedmique to
personalized advertisements on Internet storefront. Working paper, University of lllinois at
Urbana-Champaign (2000).
Lee, K. Personalized advertisement techniques for one-to-one marketing on Internet stores.
Master's thesis, Chungnam National University, 2000.
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
3
Paper References:
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
Cont.
LikeMinds. LikeMinds'·personalization server 3.0 technology backgrounder. White paper,
Andromedia (1999).
Mehta, M.; Agrawal, R,: and Rissanen, J. SLIQ: A fast scalable classifier for data mining. In
Proceeding of the Fifth International Conference on Extending Database Technology. Avignon
(March 1996).
Net Perceptions. www.netperceptions.com (2000).
Peppers, D., and Rogers, M. The One to One Future: Building Relationships One Customer at a
Time. New York: Currency and Doubleday, 1993.
Quinlan, J.R. Induction of decision-trees. In J .W. Chavlik, W. Jude, and T.G. Dietterich (eds.),
Readings in Machine Learning. San Mateo, CA: Morgan Kaufmann, 1990, pp. 57-69.
Quinlan, J.R. Leaming decision-tree classifiers. ACM Computing Surveys, 28, 1 (March 1996),
71-72.
Schafer, J.B.; Konstan, J.;: and Riedl, J. Recommendation system in ecommerce. In Proceedings
of the ACM Conference on Electronic Commerce, Denver (November 3-5,1999), pp. 158-166.
Shaw, M.J Machine leaming methods for intelligent decision support. Decision Support Systems,
10,2 (September 1993), 79-83.
Shaw, M.J.; Blanning, R; Strader, T.; and Whinston, A. Handbook on Electronic Commerce.
Berlin and Heidelberg: Springer-Verlag, 2000.
Tsai, L.-H., and Koehler, GJ The accuracy of concepts learned from induction. Decision Support
Systems, 10,2 (September 1993), 161-172.
Weiss, S.M., and Indurkhya, Nitin. Predictive Data Mining: A Practical Guide
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
4
Reference for the paper:

http://an5qy7ag4q.scholar.serialssolutions.com/?sid=google&auinit=J
W&aulast=Kim&atitle=Application+of+DecisionTree+Induction+Techniques+to+Personalized+Advertisements+on+Int
ernet+Storefronts&title=International+journal+of+electronic+commer
ce&volume=5&issue=3&date=2001&spage=45&issn=1086-4415
FOR MORE INFO...
STONY BROOK UNIVERSITY LIBRARIES
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
5
Introduction
 This paper studies personalized recommendation
techniques that suggest products or services to the
customers of Internet storefronts based on their
demographics or past purchasing behavior
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
6
Currently available techniques
 Currently available recommendation
techniques are primarily based:
 Collaborative filtering
 Preference scoring
 Rule-based approaches
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
7
Currently available techniques
cont.
 Collaborative filtering selects advertisements for
customers based on the opinions of other customers
with similar past preferences.
 The preference-scoring approach to personalized
recommendation uses a preference-score concept to
select personalized advertisements based on initial
customer profile, purchase history, and behavior in
Internet stores
 In the rule-based approach, marketing rules from
marketing experts are a core component in providing
personalized advertisements.
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
8
Rule based approach
 Based on the marketing rule that
“undergraduates in Indiana whose
income is greater than $80000 prefer
to own Luxury cars”, car
advertisements are displayed
whenever customers who live in
Indiana access the Internet
storefront.
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
9
Rule based approach
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
10
Issues in Rule based approach
 The marketing rules for personalized
recommendation usually come from
marketing experts in that domain.
 The acquisition of marketing rules, however, is
unsystematic, time-consuming, and dependent
on the intuition of marketing experts.
 The marketing rules in a knowledge base need
to be continuously updated and changed, but
manual change management is a difficult task.
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
11
Proposed Solution
 To overcome the defects of the rulebased approach in personalized
advertisements, the authors propose
decision-tree induction techniques
that extract marketing rules from
databases in Internet stores.
 Rule-based personalization has two
phases:
 marketing-rule extraction, and
 real-time advertisement selection
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
12
Proposed Solution
 In order to obtain valuable marketing rules,
decision tree induction technique is used to
analyze purchase-transaction histories,
customer profiles, and product information.
 The extracted marketing rules are stored
in a marketing-rule base and are used for
real-time personalized-advertisement
selection when customers visit the Internet
store
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
13
Proposed Solution Schematic
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
14
Marketing Rule Extraction
 The extraction of
MP3 Files
useful marketing rules
Sporting goods
using decision-tree
Pop Music
induction techniques
Classical Music
begins by defining a
hierarchy tree of
Rap
Symphony Concert
product categories to
extract marketing
Dance
Metal
rules at various
abstraction levels of
Shakira
Britany
product categories
Kim
Rolla
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
15
Marketing Rule Extraction
 The rule-extraction phase has four
steps:




(1)
(2)
(3)
(4)
selecting learning data,
generating target variables,
constructing a decision tree, and
selecting a marketing rule
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
16
Marketing Rule Ext. – Phase 1
 Step 1
 Training data sets and test data sets are
selected from customer records, making
it possible to begin with a data set of
manageable size. The size of a data set
is determined by such factors as number
of customer-profile records, the volume
of disk storage, and the computing
capacity of the inductive learning tools.
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
17
Marketing Rule Ext. – Phase 1
 Step 2
 In the next step, values of target
variables are generated for the selected
data records. In the present approach,
target variables are whether or not a
customer prefers a specific product
category in the hierarchy tree. Since the
target variables do not exist in the
databases, they must be generated. This
is done using the purchase-transaction
database.
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
18
Marketing Rule Ext. – Phase 1

For example, customer profile records can have the following
attributes:
CUSTOMER = <ID, age, gender, hobby, location, married, job, ...,
education>

After the target variables are generated, CUSTOMER relation is
extended as follows.
CUSTOMER' = <ID, age, gender, hobby, location, married, job, ...,
education, t_MP3_file, t_Sporting_Goods/Leisure, t_Pop_Music,
t_Classical_Music, ....., t_Fitness_Accessories>
where t_MP3_file, Etc. are binary variables to specify a customer's
preference for the specific product categories. That is, the binary
variable is assigned to 1 when the customer prefers the
corresponding product category.
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
19
Marketing Rule Ext. – Phase 1
 Step 3
 The third step is decision-tree construction.
Using inductive learning tools and the training
data set, decision trees are inducted for all the
target variables.
 Step 4
 After the decision tree is constructed, useful
marketing rules are filtered from constructed
decision-trees based on the validation results
from test data sets and accuracy of the rules.
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
20
Marketing Rule Ext. – Phase 1
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
21
TARGET-VARIABLE GENERATION
 There are several possible ways to
generate target variables based on
the purchase-transaction database:




(1)
(2)
(3)
(4)
the
the
the
the
counting-based method,
expected-value-based method,
statistics-based method, and
subcategory-based method.
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
22
Counting-based method
 The counting-based method, based on the number of
purchases in a specific product category, makes it
possible to decide whether or not the customer
prefers the product category.
 That is,
PRE[subij] = 1 if NP[subij] >= Omega
where, NP[subij] = the number of purchase
of customer i for product category j
Omega is a minimum threshold,
which is determined by analysts.
PRE[subij] = 0 otherwise
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
23
Expected-value-based method
N[subj] is the number of products in product category j.
Alpha is a multiplier for the expected value. Usually alpha is
equal to or greater than 1. When alpha is 1, if customer i
purchased more items in product category j than the
expected number of purchases of product category j, this
means that customer i prefers product category j
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
24
Statistics-based method
Statistical values like the mean, the median, the first
quartile, and the third quartile are used to generate
target variables
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
25
Subcategory based method
 In the case of non-leaf-node product
categories, the corresponding target
variables can be generated based on
subcategory target variable values (when
the target variables for subcategories are
already generated using one of the above
methods).
 PRE[subij] = 1 if PRE[subik] = 1 for some
subcategory k of j
PRE[subij] = 0 otherwise
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
26
MARKETING-RULE SELECTION

A decision tree includes marketing rules that link customer
demographics with preferences for product categories.
Since marketing rules may have low predictability or may
not be accurate, the valuable marketing rules have to be
selected from constructed decision trees.
Two phase heuristics are used for marketing-rule selection.
1. Choose decision trees whose predictability (1 misclassification rate), with respect to a training data set, is
greater than a certain threshold.
2. Select nodes from the filtered decision trees whose
accuracy (i.e., purity) is greater than a certain threshold.

The selected nodes in the decision trees are translated in
the rule structure. Here is an example of a translated rule.

If age < 30 and gender = male then Ballad, with accuracy
= .9 and level = 3.

Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
27
ADVERTISEMENT SELECTION
 Personalized advertisements are selected
on a real-time basis.
 Let M be the number of advertisements to
be displayed in an Internet storefront, and
L the depth of the product-category
hierarchy tree.
 The advertisement-selection algorithm is as
follows.
 STEP 1 m: = 0,l: = L, generate fact set using
the specific customer's profile.
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
28
Advertisement Selection Algorithm Continued.
 STEP 2 Until m < M and 1 > 0
 2.1 Fire marketing rules of level 1 with
customer's facts.
 2.2 If there are matching rules, matched rules
are fired by accuracy order. The product
categories appeared in consequent parts of the
rules are selected as preferred product
categories for the specific customer. Choose
advertisements for that category from the
advertisement database. The number of selected
advertisements should be less than M - m.
 2.3 If m = M, exit.
 2.4 Update m to reflect the additionally selected
advertisements, and update l to l-1.
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
29
Advertisement Selection Algorithm Continued.
 STEP 3
 If m < M, randomly select M - m advertisements
from the advertisement database
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
30
EXPERIMENT

EXPERIMENTAL DATA COLLECTION
Data from 330 respondents were gathered. At first, the respondents
were asked to select preferred product categories in MP3 music files
and sporting goods or leisure equipment. Then they were asked to
choose five MP3 music files among the 32 displayed products. The
displayed products were equally selected from all MP3 leaf nodes. The
respondents were also asked to purchase five sporting goods or leisure
equipment items among the 24 displayed products. The displayed
products were equally selected from all sporting goods or leisure
equipment leaf nodes (three products from each of the eight leaf
nodes). On the next two pages, respondents were asked to rank MP3
music files and sporting goods/leisure equipment items on a five-point
personal-interest scale (1 = low interest to 5 = high interest). 16
advertisements for MP3 music files were displayed to rank interests
about the advertisements, which were selected one item from each leaf
node. In addition, 16 advertisements for sporting goods or leisure
equipment items were displayed to rank interests about those, which
were two selected items from each leaf node. Finally, the respondents
were asked to specify their profiles, including age, gender, job, and so
on.
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
31
MARKETING-RULE EXTRACTION
The 330 respondents were divided into two data sets,
one for marketing-rule generation, and the other for
testing the effectiveness of the proposed approach.
For marketing-rule generation, 198 responses (60%
of the total data set) were used. Among the 198 data
items, 70 percent were used to construct decision
trees, and 30 percent to validate constructed decision
trees. For target-variable generation, an expected
value-based approach of alpha= 1 was used. All told,
37 decision trees were constructed (26 for MP3 music
files and 11 for sporting goods or leisure equipment).
Decision trees with predictability greater than 65
percent were selected in the marketing-rule selection
step. Decision rules with accuracy greater than 65
percent were selected as valuable marketing rules.
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
32
EFFECTIVENESS TEST

To test the effectiveness of the proposed decision-tree
induction approach, the results were compared to the
preference-scoring approach and a random selection. The
data of 132 respondents (40% of the total data set) were
isolated for effectiveness testing. Three advertisements
were selected for each respondent: that is, one
advertisement using each of the three approaches for each
respondent. In the case of the decision-tree induction
approach, the marketing rules generated earlier were used
to select personalized advertisements. In the case of the
preference-scoring approach, advertisements were selected
based on the respondents' own interest-manifestation
behavior (purchase history, profile, preferred product
categories). In the case of random selection,
advertisements were selected on a simple random basis.
The means of the five-point-scale personal-interest scores
for the selected advertisements were compared statistically
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
33
RESULT
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
34
Conclusion



Statistically, there were differences among the five-point-scale
scored means of customer' interest in advertisements from the
three different approaches: decision-tree induction, preference
scoring, and random selection. Thus, the effectiveness of
personalized-recommendation techniques is affected by the
decision algorithm utilized in the personalized- advertisement
selection process.
The proposed decision-tree induction approach is effective in
generating marketing rules as a means to assist rule-based
personalized recommendations.
In the case of sporting goods or leisure equipment, decision-tree
induction gave the best results. But in the case of MP3 music files,
preference scoring gave the best results. This indicates that the
appropriateness of personalized-advertisement techniques
depends on the characteristics of product categories. Thus it is
necessary to study how various personalized-advertisement
techniques can be used cooperatively.
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
35
ThankYou
 Next Presentation – 5/2/2006
 Topic - New Advances in Data Mining
 By:
 Group 14
 Madhavarapu,Chidroop
 Sandhuria,Deepanshu
 DON’T FORGET TO COME TO CLASS!
Source: International Journal of
Electronic Commerce 5 no3 45-62
Spr 2001
36