Data Mining and Management

Download Report

Transcript Data Mining and Management

Data Mining for Management
and E-commerce
By Johnny Lee
Department of Accounting and
Information Systems
University of Utah
Agenda
1. Microeconomic view of Data Mining
2. A Survey of recommendation systems in
E-commerce
3. Turning Data Mining into a management
science tool
A Microeconomic View of Data Mining
• Kleinberg et al. 1998
• Research Question:
What is the economic utility of data mining?
How to determine whether DM result is
interesting?
A Microeconomic View of Data Mining
• “Interesting Pattern”
– Confidence and support
• (High balanceHigh income)
– Information content
• ?
– Unexpectedness
• (Super ball result  stock price)
– Actionability
• $,$,$….
A Microeconomic View of Data Mining
• Value of data mining
– computing power and data
un-aggregate optimization
– Study of intricate ways (correlation and clusters
in data that affect the enterprise’s optimal
DECISION
A Microeconomic View of Data Mining
Value of DM
Firm max f(x)
f ( x)   fi ( x)
fi ( x)  g ( x, yi )
yi=customer data
max g ( x, yi )
Example one
If (demand of Beer) is not related (demand of
diapers) then NO DM
If (demand of beer +demand of diaper)
=(supply of beer-demand of beer)+*(supply of
diaper- demand of diaper)+
then DM is needed
Example 2
Phone rate and users
without Data mining
experimenting arbitrary clusters
with data mining
optimize the profit by best matching
customers and strategies
Max
( X1 , X 2 )D
2
 maxc  X
iC
i
2
, ci  X 2 
Example 3
• Beer and diaper a~~gain
• Mining to decide how to jointly promote
items.
• Mining data in rows or columns
• Goal oriented
What is the goal? Generated revenue
•
Conflict in action space, what to do?
Contribution
• Automatic pattern filtering system based on
economic value
• Rules for manual pattern filtering system
• Rules for determine trigger point of Data
Mining
A survey of recommendation
systems in electronic commerce
• Wei et al. 2001
• Research question:
What are the types of E-commerce
recommendation systems and how do they
work?
E-commerce recommendation
Systems
• Suggest items that are of interest to users
based on something.
• Something:
– Customer characteristics (demographics)
– Features of items
– User preferences: rating/purchasing history
Framework for Recommendation
Feature of Items
User
Demographics
Recommnedation
System
User's preferenece
Recommnedation
Types of Recommendation
• Prediction on preference of customers
Personalized and non personalized
• Top-N recommendation items for customers
Personalized and non personalized
• Top-M users who are most likely to
purchase an item
Classification of
Recommendation Systems
•
•
•
•
•
•
•
Popularity-based: best sell
Content-based: similar in items features
Collaborative filtering: similar user’s taste
Association-based: related items
Demographic-based: user’s age, gender…
Reputation-based: Represent individual
Hybrid
Popularity-based
Procedures of Content-based
1. Feature extraction and Selection
2. Representation item pool by feature
decided
3. User profile learning
4. Recommendation
Content-based
User Profile Learning
pim   j 1 w j f mj  b
k
•
•
•
•
pim=preference score of the user I on item m
wi=coefficient associated with feature j
fmj=the value of the j-th feature for item m
b=bias
Collaborative Filtering
•
Recommend items based on opinions of
other similar users
1. Dimension reduction by trimming preference
matrix
2. Neighborhood formation for most similar
user(s)
3. Recommendation generation
Collaborative filtering
Neighborhood Formation
•
•
•
•
•
Pearson correlation coefficient
Constrained Pearson correlation coefficient
Spearman rank correlation coefficient
Cosine similarity
Mean-square
Neighborhood Selection
• Weight threshold
• Center-based best-k neighbors
• Aggregate-based best-k neighbors
Recommendation Generation
• Weighted average
• Deviation-from-mean
• Z-score average
Association-based
•
Item-correlation for individual users
1. Similarity computing
2. Recommendation generation
•
Association Rules
– Guns and ammunition
– Cigarette and lighter
– Paper plate and soda
Theory: Complementary goods?
No theory: Co-occurrence?
Association-based
sim(i, j ) 
( p
ui
( p
ui
 pi )( puj  p j )
 pu )
2
( p
uj
 pu )
2
Pui=preference score of user u on item I
Pibar=average preference sore of the I-th item over the set of corate user U
Pubar=average of the u-th user’s preference score
Association Based
Demographics-based
•
Items that customers with similar
demographics characteristics have bought
–
Teens marketing
1. Data transformation: Counting, Exp(# of
items), Statistic based
2. Category Preference model learning
3. Recommendation generation
Demographics-based
Methods:
1. Counting-based (frequency threshold)
2. Expected-value-based method
3. Statistics-based method
Comparison of recommendation
approach
Approach
Input info
Types of
recommendation
Degree of
Personalization
Popularity-based
User preferences
Top-N
Non-Personalized
Content-based
Features of items and Prediction, top-N and Personalized
individual user
top-M users
preferences
Collaborative
Filtering
User preferences
Prediction top-N
recommendation
Personalized
Association-based
User preferences
Prediction top-N
recommendation
Personalized
Demographics-based
User demographic
&preferences,feature
s of items
Prediction top-N &
top-M
Personalized
Reputation-based
User preferences &
reputation matrix
top-N & possible
prediction
Personalized
Contribution
• Provide a systematic way to choose from Ecommerce recommendation systems for
practitioners
• Lay out existing approach
BREAK
Turning Datamining into a
Management Science Tool: New
Algorithms and Empirical Results
• Cooper & Giuffrida 2000
• Research question:
How can we improve the performance of
PromoCast (or other market) Forecast system
by adding some local adjustment parameters?
Terminology
• SKU: Stock keeping unit
• KDS: knowledge discovery using SQL)
• Management science: ??????????????
KDS
Start Rule generation
Phase
Location Data
Bottom-Up
Rule Generation
Sales records
(error from Sales
Forecast)
Entropy-Based
Rules Ranking
Rule Filtering
Entropy and
Confidence
satisfy the level
decided
Yes
Corrective Action
No
Rule network example
Activated Nodes example
Corrective Action
U_12=
U4-11=
U_3=
U_2=
U_1=
Ok=
O_1=
O_2=
O_3=
O_4_11=
O_12=
0
58
221
1149
3583
1115
7
1
0
0
0
KDS
•
•
•
•
Bottom-up: start from the input database
No Memory-Bound processing
Minimal data preprocessing
Separates the learning phase from the action
phase
• Evaluation: for 10117 cases 8.9% ($?)
KDS
• Is this a research? Is this a case study?
• Is this a management research?
• Why should I know about it as a
researcher/manager/engineer?
Acknowledge
• All right of trade marks and web-site
contents belongs to the lawful owners