Machine Learning 1. Introduction

Download Report

Transcript Machine Learning 1. Introduction

Data Mining Course 2007
Clustering
Eric Postma
Overview
Three approaches to clustering
1. Minimization of reconstruction error
•
PCA, nlPCA, k-means clustering
2. Distance preservation
•
Sammon mapping, Isomap, SPE
3. Maximum likelihood density estimation
•
Gaussian Mixtures
• These datasets have identical statistics up to 2nd order
1. Minimization of reconstruction error
Illustration of PCA (1)
• Face dataset (Rice database)
Illustration of PCA (2)
• Average face
Illustration of PCA (3)
• Top 10 Eigenfaces
2D PCA projection
Each 39-dimensional data item describes different aspects of
the welfare and poverty of one country.
Non-linear PCA
• Using neural networks (to be discussed
tomorrow)
2. Distance preservation
Sammon mapping
• Given a data set X. The distance between
any two samples is defined as Dij
• We consider the projection on a two
dimensional plane where the projected
points are separated by dij
• Define an Error function
E
1
i i j Dij

i
i j
D
ij  d ij 
2
Dij
Sammon mapping
Main limitations of Sammon
• The Sammon mapping procedure is a
gradient descent method
• Main limitation: local minima
• MDS may be preferred because it finds
global minima (being based on PCA)
• Both methods have difficulty with “curved
or curly subspaces”
Isomap
• Tenenbaum
• Build a graph in which each node
represents a data point
• Compute shortest distances along the
graph (e.g., Dijkstra’s algorithm)
• Store all distances in a matrix D
• Perform MDS on the matrix D
Illustration of Isomap (1)
• For two arbitrary points on the manifold Euclidean
distance does not always reflect similarity (cf. dashed blue
line)
Illustration of Isomap (2)
• Isomap finds the appropriate shortest path along the graph
(red curve, for K=7, N=1000)
Illustration of Isomap (3)
• Two-dimensional embedding (red line is the shortest path
along the graph, blue line is the true distance in the
embedding.
Illustration of Isomap (4)
• Isomaps (●) ability to find the intrinsic dimensionality
as compared to PCA and MDS (∆ and o).
Illustration of Isomap (5)
Illustration of Isomap (6)
Illustration of Isomap (7)
• Interpolation along a straight line
Stochastic Proximity Embedding
• SPE algorithm
• Agrafiotis, D.K. and Xu, H. (2002). A selforganizing principle for learning
nonlinear manifolds. Proceedings of the
National Academy of Sciences U.S.A.
Stress function
Output proximity between points i and j
Input proximity between points i and j
f (dij , rij )  (dij  rij )
2
Swiss roll data set
Original 3D set
2D embedding obtained by SPE
Stress as a function of embedding dimension
(averaged over 30 runs)
Scalability (# steps for four set sizes)
Linear scaling
Conformations of methylpropylether
C1C2C3O4C5
Diamine combinatorial library
Clustering
• Minimize the total within-cluster variance
(reconstruction error)
E  c i kic x  wc
C
N
i
2
• kic = 1 if a data point belongs to cluster c
K-means clustering
1. Random selection of C cluster centres
2. Partition the data by assigning them to the clusters
3. The mean of each partitioning is the new cluster centre
A distance threshold may be used…
• Effect of distance threshold on the number of clusters
Main limitation of k-means clustering
• Final partitioning and cluster centres
depend on initial configuration
• Discrete partitioning may introduce errors
• Instead of minimizing the reconstruction
error, we may maximize the likelihood of
the data (given some probabilistic model)
Neural algorithms related to k-means
• Competitive learning networks
• Kohonen self-organizing feature maps
3. Maximum likelihood
Gaussian Mixtures
p( x)  i p( x | i) P(i)
K
• Model the pdf of the data using a mixture of
distributions
• K is the number of kernels (<< # data points)
• Common choice for the component densities p(x|i):
 x  i
1

p( x | i ) 
exp
2 d /2
2

2

(2i )
i

2




Illustration of EM applied to GM model
The solid line gives the initialization of the EM algorithm: two kernels,
P(1) = P(2) = 0:5, μ1 = 0.0752; μ2 = 1.0176, σ1 = σ2 = 0:2356
Convergence after 10 EM steps..
Relevant literature
• L.J.P. van der Maaten, E.O. Postma, and H.J. van den
Herik (submitted). Dimensionality Reduction: A
Comparative Review.
• http://www.cs.unimaas.nl/l.vandermaaten