1.Introduction

Download Report

Transcript 1.Introduction

Using Text Mining to Infer Semantic
Attributes for Retail Data Mining
Advisor:Dr. Hsu
Graduate:Chien-Shing Chen
Author:Rayid Ghani
Andrew E. Fano
Outline








Motivation
Objective
Introduction
Overview of our approach
Naïve Bayes
Expectation-Maximization
Conclusions
Personal Opinion
Motivation
By discussion with CRM experts and retailers who
currently analyze large amounts of transactional data
but are unable to systematically understand the
semantics of an item.
Objective

We show that semantic features of these items can
be successfully extracted by applying text learning
techniques to the descriptions obtained from
websites of retailers.
1.Introduction
We treat the learning problem as a traditional text
classification problem and create one text classifier for
each semantic feature.
The initial algorithm used to perform this
classification was Naïve Bayes and a description is
given below.
Learning algorithms that combine information from
labeled and unlabeled data.
2.Overview of our approach
1.Collect information about products
We constructed a web crawler to visit web sites of
several large apparel retail stores and extract
names, urls, descriptions, prices and categories of
all products available.
The extracted items and features were placed in a
database and a random subset was chosen to be
labeled.
2.Overview of our approach
2.Defining the set of features to extract
2.Overview of our approach
3.Labeling Training Data
The data (product name, descriptions,
categories, price) collected by crawling
websites of apparel retailers was placed into a
database and a small subset was given to a
group of fashion-aware people to label with
respect to each of the features described in the
previous section.
4.1 Naïve Bayes
1.First a class is selected according to
class prior probabilities.
2.Each word in a document is generated
independently of the others given the class
4.2 Naïve Bayes
Labeled Data
Unlabeled Data
cj
Juniors
di(u)
d1
d2
d3
炸雞
薯條
漢堡
炸雞
漢堡
冰琪淋
汽水
d4
棒棒糖
di
4.3 Naïve Bayes
D: Labeled training data
V: Vocabulary form labeled training data
N(Wt,di) : count of the number of times word Wt
occurs in document di
4.4 Naïve Bayes
Wdik :
( Wdi1 , Wdi2 , Wdi3 , Wdi4……..)
5.1 Incorporating Unlabeled Data
using EM
1. We collected names and descriptions of
thousands of women’s apparel items from
websites.
2. We only labeled about 600 of those, leaving the
rest as unlabeled.
3.The naïve Bayes equations presented above are
no longer adequate to find maximum a posteriori
parameter estimates.
4. Combine information from labeled and
unlabeled data
5.2 Expectation-Maximization
1.EM is an iterative statistical technique for
maximum likelihood estimation in problems with
incomplete data.
2.The EM technique can be used to find locally
maximum parameter estimates.
5.3 Expectation-Maximization
EM: Iterative two-step process
1. The E-step calculates probabilistically-weighted
class labels ,
, for every unlabeled
document.
2. The M-step estimates new classifier parameters
using all the documents.
5.4 Expectation-Maximization
The classifier is used to assign probabilisticallyweighted class labels to each unlabeled document
by calculating expectations of the missing class
labels,
5.5 Expectation-Maximization
Calculating a maximum a posteriori estimate
of
5.6 Expectation-Maximization
Instead of trying to maximiz
directly we work with
instead,
as a step towards making maximization
tractable. Let
5.8 Expectation-Maximization
Expree the complete log likelihood of the parameters
Zij
di
Cj
0
Z12
1
0
1
0
0
Z42
0
0
1
1
1
0
Z43
0
5.9-1 Expectation-Maximization
Let
denote the estimates for z and
at iteration k. Then, the algorithm finds a local
maximum of
by iterating the following two
steps:
5.9-2 Expectation-Maximization
6.1 Experimental Results
A list of words which had high weights for some of
features that we used the naïve bayes classifier to
extract.
6.2 Experimental Results
Using unlabeled data and combining it from the
initially labeled product descriptions with EM helps
improve the accuracy even further.
7. 1 Applications
This results enables us to hypothesize that our system can be
applied to a wide variety of data and can adapt to different
distributions of test sets using the unlabeled data.
7. 2 Applications
Our system monitors the browsing behavior of
user browsing a retailer’s website and in realtime, extracts names and descriptions of
products that they browse.
For each product browsed, our system calculates
where Ai,j is the jth value of the ith attribute.
7. 3 Applications
This ability to profile retailers enables strategic
applications such as competitive comparisons,
monitoring brand positioning, tracking trends
over time. , etc.
8. Conclusions
We use it to create profiles of individuals that can
be used for recommendation systems that improve on
traditional collaborative filtering approaches.
Personal Opinion
The full complexity of real-world text data can’t be
completely captured by known statistical models.