Using Text Mining to Infer Semantic Attributes for Retail Data Mining

Download Report

Transcript Using Text Mining to Infer Semantic Attributes for Retail Data Mining

Using Text Mining to Infer Semantic
Attributes for Retail Data Mining
Authors: Rayid Ghani & Andrew E. Fano
Presenter: Vishal Mahajan
INFS795
Agenda







Drawbacks in Current Data Mining Techniques.
Purpose.
Assumptions and Constraints.
Methodology or Approach.
Extraction of Feature Set.
Labeling .
Classification Techniques.




Naïve Bayes
EM
Experimental Results.
Recommender System.
Drawbacks in Current Data Mining
Techniques





Semantic Features not automatically
considered.
Transactional Data analyzed without
analyzing the customer.
Trending is partial.
Retail Items treated as objects with no
associated semantics.
Data Mining Techniques (association rules,
decision trees, neural networks) ignore the
meaning of items and semantics associated
with them.
Purpose of the Presentation




Describe a system that extracts semantic
features.
Populate the knowledge base with the
semantic features.
Use of text mining in retailing to extract
semantic features from website of retailers.
How profiles of customers or group of
customers can be build using Text Mining.
Assumptions & Constraints




Focus on Apparel Retail segment only.
Results focus on extracting those
semantic features that are deemed
important by CRM or Retail experts.
Data extracted from retailers website.
Models generated can be extended
beyond the Apparel Retail segment.
Approach






Collect Information about products.
Define set of features to be extracted.
Label the data with values of the features.
Train a classifier/extractor to use the labeled
training to extract features from unseen data.
Extract Semantic Features from new products
by using trained classifier.
Populate a knowledge base with the products
and corresponding feature.
Data Collection Methodology

Use of web crawler to extract the following
from large retailers’ website:







Names
URLs
Description
Prices
Categories of all Products Available
Use of wrappers.
Extracted Information stored in a database
and a subset chosen.
Extraction of Feature Set




Feature selection based on Expert Systems.
Use of extensive domain knowledge.
Feature selection based on Retail Apparel
section in mind.
Feature Selected for the project 








Age Group
Functionality
Price
Formality
Degree of Conservativeness
Degree of Sportiness
Degree of Trendiness
Degree of Brand Appeal
Labeling Training Data



Database created with data from
collected from retailer website.
Subset of 600 products chosen and
labeled.
Labeling guidelines provided
Details of Features extracted
from each Product Description
Age Group
Age Group
All ages
Functionality
Loungewear, Sportswear, Eveningwear,
Business Casual, Business Formal
How will the item be used?
Pricepoint
Discount, Average, Luxury
Compared to other items of this
kind is this item cheap or
expensive?
Formality
Informal, Somewhat Formal, Very Formal
How formal is this item?
Conservation
1 (gray suits) to 5 (Loud, flashy clothes)
Does this suggest the person is
conservative or flashy?
Sportiness
1 to 5
Trendiness
1 (Timeless Classic) to 5 (Current favorite)
Is this item popular now but
likely to go out of style? Or is it
more timeless?
Brand Appeal
1 (Brand makes the product unappealing) to 5
(high brand appeal)
Is the brand known and makes
it appealing
Juniors, Teens, GenX, Mature,
For what ages is this item most
appropriate?
Verifying Training Data


Disjoint Dataset as labeling done by different
individuals.
Association rules (between features) used to
obtain consistency in labeled data.



Apriori algorithm
Apriori Algorithm implemented with single
and two feature antecedents and
consequents.
Desired Consistency in Labeling achieved by
applying associating rules
Apriori Algorithm


Find the frequent itemsets: the sets of items
that have minimum support
A subset of a frequent itemset must also be
a frequent itemset


i.e., if {AB} is a frequent itemset, both {A} and
{B} should be a frequent itemset
Use the frequent itemsets to generate
association rules.
The Apriori Algorithm — Example
TID
100
200
300
400
C1 itemset sup.
{1}
2
{2}
3
Scan D
{3}
3
{4}
1
{5}
3
Items
134
235
1235
25
Database D
L2 itemset sup
C2 itemset sup
2
2
3
2
{1
{1
{1
{2
{2
{3
C3 itemset
{2 3 5}
Scan D
{1 3}
{2 3}
{2 5}
{3 5}
2}
3}
5}
3}
5}
5}
1
2
1
2
3
2
L1 itemset sup.
{1}
{2}
{3}
{5}
2
3
3
3
C2 itemset
{1 2}
Scan D
L3 itemset sup
{2 3 5} 2
{1
{1
{2
{2
{3
3}
5}
3}
5}
5}
Training from Labeled Data


Learning problem treated as a text
classification problem.
Only one text classifier for each semantic
feature.



e.g Price of product will be classified as either
discount or average or luxury.
Age group is classified as Juniors or Teens or GenX
or Mature or All Ages.
Classification was performed using Naïve
Bayes classification.
Sample Association Rules
Rule
Support
Confidence
Informal <Sportswear
24.5%
93.6%
Informal <Loungewear
16.1%
82.3%
Informal <- Juniors
12.1%
89.4%
PricePoint =Ave <BrandAppeal=2
8.8%
79.0%
BrandAppeal=5 <Trendy=5
16.3%
91.2%
Sportswear <Sporty=4
9.0%
85.7%
AgeGroup=Mature
<- Trendy=1
9.4%
78.8%
Naïve Bayes



Simple but effective text classification method.
Class is selected according to class prior probabilities.
This Model assumes each word in a document is
generated independently of the other in the class.
| D|
P r(wt | cj ) 
1   N ( wt , di ) P r(cj | di )
i 1
|V | | D|
| V |   N ( ws, di ) P r(cj | di )
s 1 i 1
where N(wt,di) = count of times word wt occurs in document di
and Pr(cj,di) = {0,1)
Incorporating Unlabeled Data





Initial sample was for 600 products only.
Need to take care of unlabeled products to
make any meaningful predictions.
Use of Supervised learning algorithms.
These algorithms have proved to reduce the
classification error considerably.
Use of Expectation-Maximization (EM)
Algorithm as the supervised technique.
Expectation-Maximization (EM)
Method



EM is an iterative statistical technique for
maximum likelihood estimation for incomplete
data.
In the retail classification problem, unlabeled
data is considered as incomplete data.
EM 


Locally maximizes the likelihood of the parameter.
Gives estimates for missing values.
Expectation-Maximization (EM)
Method- cont






EM method is a 2-step process.
Initial Parameters are set using naïve Bayes from just
the labeled documents.
Subsequent iteration of E- and M-Steps.
E-Step
 Calculates probabilistically weighed class label
Pr(cj|dj), for every unlabeled document.
M-Step
 Estimates new classifier parameter using all
documents (Equation 1).
E and M steps iterated unless classifier converges
Experimental Results
Algorithm
Age
Group
Functionality
Formality
Conservation
Sportiness
Trendiness
Brand
Appeal
Baseline
29%
24%
68%
39%
49%
29%
36%
Naïve
Bayes
66%
57%
76%
80%
70%
69%
82%
EM
78%
70%
82%
84%
78%
80%
91%
Experimental Results
Brand Appeal=5(high)
Conservative=5(high)
Lauren
Ralph
DKNY
Kenneth
Cole
imported
Lauren
Ralph
Breasted
Seasonless
Trouser
Jones
Sport
Classic
blazer
Conservative=1(low)
Rose
Special
Leopard
Chemise
Straps
Flirty
Spray
Silk
platform
Formality=Informal
Jean
Tommy
Jeans
Denim
Sweater
Pocket
Neck
Tee
Hilfiger
Somewhat Formal
Jacket
Fully
Button
Skirt
Lines
York
Seam
Crepe
leather
AgeGroup=Junior
Functionality=Loungewear
Functionality=Partywear
Sportiness=5(High)
Trendiness=1(low)
Jrs
Dkny
Jeans
Tee
Colligate
Logo
Tommy
Polo
Short
sneaker
Chemise
Silk
Kimono
Calvin
Klein
August
Lounge
Hilfiger
Robe
gown
Rock
Dress
Sateen
Length
Skirt
Shirtdress
Open
Platform
Plaid
flower
Sneaker
Camp
Base
Rubber
Sole
White
Miraclesuite
Athletic
Nylon
Mesh
Lauren
Seasonless
Breasted
Trouser
Pocket
Carefree
Ralph
Blazer
button
Results on new data set


The subset of data that was used earlier
was from a single retailer.
Another sample of data was collected
from variety of retailers. The results are
as follows.
Algorithm
Age
Group
Functionality
Formality
Conservation
Sportiness
Trendiness
BrandAppeal
Naïve
Bayes
83%
45%
61%
70%
81%
80%
87%

Results are consistently better.
Recommender System




Creation of customer profiles (real time) is
feasible by analyzing the text associated with
products and by mapping it to pre-defined
semantic features.
Identity of customer is not known and prior
transaction history is unknown.
Semantic features are inferred by the
“browsing” pattern of the customer.
Helps in suggesting new products to the
customers.
Recommender System
Mathematically 
 P(Aij|Product)  i,j is calculated
 Where Aij is the jth value of ith attribute


i=semantic attributes, j=possible values
User profile is constructed as follows
Pr(Ui,j|Past N Items) = 1/N  Pr( A | Item s )
N

i, j
k 1
k
Types of Recommender
Systems


Two Types of Recommender Systems.
Collaborative Filtering.
 Collect user feedback in terms of ratings.
 Exploit similarities and differences of customers to
recommend items.
 Issues



Sparsity Problem.
New Items.
Content Filtering
 Compares the contents
 Issues


Narrow in scope
Recommends similar products only
Conclusions





The systems learns from the use of
supervised and semi-supervised techniques.
Major assumptions..Products accurately
convey the semantic attributes.??
Small sample of data used to Infer results.
Practical applications not verified.
System bootstrapped from a small number of
labeled training examples.
Interesting application which could be
evolved to generate trends for retail
marketers.