Max-margin Clustering Algorithm

Download Report

Transcript Max-margin Clustering Algorithm

Max-margin Clustering: Detecting Margins from Projections of Points on Lines
1
Gopalan ,
2
Sankaranarayanan
Raghuraman
and Jagan
1Center for Automation Research, University of Maryland, College Park, MD USA; 2NEC Labs, Cupertino, CA USA
Multi-cluster Problem
Max-margin Clustering Algorithm
Location information of
projected points (SI)
alone is insufficient to
detect margins
 Draw lines between all pairs of points
 Estimate the probability of presence of margins between a
pair of points xi and xj by computing f(xi,xj)
 Perform global clustering using f between all point-pairs
Results
The Role of Distance of Projection
Proposition 2
For line intervals in margin region,
perpendicular to the separating
hyperplane
min Dmin  min  i
Problem Statement
Given an unlabelled set of points forming k clusters, find a
grouping with maximum separating margin among the clusters
Int *
 Prior work: (Mostly) Establish feedback between different
label proposals, and run a supervised classifier on it
 Goal: To understand the relation between data points and
margin regions by analyzing projections of data on lines
Two-cluster Problem
Assumptions
Linearly separable clusters
 Kernel trick for non-linear case
No outliers in data (max margin
exist only between clusters)
 Enforce global cluster balance
Proposition 1
SI* exists ONLY on line segments in
margin region that are perpendicular
to the separating hyperplane
 Such line segments directly provide
cluster groupings
Defn: Dmin of a line interval is the
minimum distance of projection
of points in that interval.
No outlier assumption: Max
margin between points within a
cluster M m  min  i
i
i
Proposition 3
For line intervals inside a cluster of
length more than Mm
max
D

M
/
2
min
m
CL
Int
Proposition 4
An interval with SI having no projected
points with distance of projection less
than Dmin*, [ SI ]Dmin*  min  i can lie
i
only outside a cluster; where
Dmin*  min  i
i
A Pair-wise Similarity Measure for Clustering
f ( xi , x j )  exp( max D[SI ]D )
D:Intij
 f(xi,xj)=1, iff xi=xj
 f(xi,xj)<<1, iff xi and xj are from different clusters, and Intij is
perpendicular to their separating hyperplane
Summary
Clustering
Detecting margin regions
 Obtaining statistics of location and distance of projection of
points that are specific to line segments in margin regions
(Prop. 1 to 4)
 A pair-wise similarity measure to perform clustering, which
avoids some optimization-related challenges prevalent in
most existing methods
References
1. F. De la Torre, and T. Kanade, “Discriminative cluster analysis”, ICML, pp. 241-248, 2006. ([8] in table)
2. K. Zhang, I.W. Tsang, and J.T. Kwok, “Maximum margin clustering made practical”, IEEE Trans.
Neural Networks, 20(4), pp. 583-596, 2009. ([31] in table)