featureselection.asu.edu

Download Report

Transcript featureselection.asu.edu

Learning Dissimilarities for Categorical Symbols
Jierui Xie, Boleslaw Szymanski, Mohammed J. Zaki
Department of Computer Science
Rensselaer Polytechnic Institute
Troy, NY 12180, USA
{xiej2, szymansk, zaki}@cs.rpi.edu
Presentation Outline
• Introduction
• Related Work
• Learning Dissimilarity (LD) Algorithm
• Experimental Results
• Conclusion
Introduction
• Distance plays an important role in many data mining tasks
• Distance is rarely defined precisely for categorical data
– nominal and ordinal
– e.g., rating of a movie {very bad, bad, fair, good, very good}
• Goal: derive dissimilarities between categorical symbols
– To enable the full power of distance-based methods.
– Hopefully easier for interpretation as well.
Notation
• A dataset X ={x1,x2,…,xt} of t data points. Each point xi has m
attributes values xi = (x1i,…, xmi ).
• Each attribute Ai is drawn from ni discrete values {ai1,…, aini}.
Each aij is also called a symbol.
• The similarity between symbols aik and ail :
• The dissimilarity
or
• The distance between two data points xi and xj is defined in
terms of the distance between symbols
Notation (cont.)
• Let the frequency of symbol ai in the dataset be
then the probability
• Class label
• Output of the classier on point xi :
• The error of misclassifying point xi:
• Total classification error:
Related Work
• Unsupervised methods:
– Assign
based on frequency; Emphasize mismatch or match for
frequent or rare symbols from certain probability or information theory
point of views.




Lin
Burnaby
Smirnov
Goodall
Gambaryan
Eskin
Occurrence Frequency (OF)
Inverse Occurrence Frequency (IOF)
• Supervised methods:
– Take the classes information into account
 Value Difference Metric (VDM)
 Cheng et al..
Unsupervised Method Examples
• Goodall : less frequent attribute values make greater contribution to the
overall similarity than frequent attribute values on match. That is, if ai=aj
otherwise,
0
• Inverse Occurrence Frequency (IOF): assigns higher
weight to mismatches on less frequent symbols. That is, if ai!=aj
otherwise,
1
Supervised Method Examples
• VDM:
– Symbols are similar if they occur with a similar relative frequency for all
the classes.
where Cai,c is the number of times symbol ai occurs in class c. Cai is the total
number of times ai occurs in the whole dataset. h is a constant.
• Cheng:
– based on RBF classier
– They attempt to evaluate all the pair-wise distances between symbols,
and they optimize the error function using gradient descent method
Learning Dissimilarity Algorithm
• Motivation:
– learn a mapping function from each categorical attribute Ai onto the real
number interval based on the classes information may facilitate the
classification task and is possible.
Learning Dissimilarity Algorithm (cont.)
• Based on nearest neighbor classifier and the distance difference from
two classes
• Iteration learning
• Guided by gradient descent method to minimize the total classification
error
Learning Dissimilarity Algorithm (cont.)
• Objective Function and Update Equation
Learning Dissimilarity Algorithm (cont.)
• The Derivative of ∆d
•The full update equation
Learning Dissimilarity Algorithm (cont.)
• Intuitive meaning of assignment update
Experimental Result
• Datasets
Experimental Result (cont.)
• Redundancy among symbols
Experimental Result (cont.)
• Comparison with Various Data-Driven Methods
– On average, the LD and VDM achieve the best accuracy, indicating that supervised
dissimilarities attain better results over the unsupervised ones. Among the unsupervised
measures, IOF, Lin are slightly superior to others.
Experimental Result (cont.)
• Analysis with confidence interval (accuracy +/- standard
deviation)
– LD performed statistically worse than Lin on datasets Splice and Tic-tac-toe but
better than Lin on datasets Connection-4, Hayes and Balance Scale.
– LD performed statistically worse than VDM only on one dataset (Splice) but
better on two datasets (Connection-4 and Tic-tac-toe).
– Finally, LD performed statistically at least as well as (and on some datasets, e.g.
Connection-4, better than) the remaining methods.
Experimental Result (cont.)
• Comparison with Various Classifiers
– LD performed statistically worse than the other methods on only one dataset
(Splice) but performed better on at least three other datasets than each of the
other methods.
Conclusion
• A task-oriented or supervised iterative learning approach to
learn a distance function for categorical data.
– Explores the relationships between categorical symbols by utilizing the
classification error as guidance.
– The real value mappings found by our algorithm provide discriminative
information, which can be used to refine features and improve classification
accuracy.
Thank you!