#### Clustering Theory

Download
Report

####
Transcript Clustering Theory

Clustering Theory
John Nicholas Owen
Sarah Smith
What is clustering?
The activity of grouping similar objects.
Clustering methods are useful for data
reduction, for developing classification
schemes and for suggesting or supporting
hypotheses about the structure of the data
Steps to Clustering
Pattern representation. The analyst identifies
the number, type, and scale of features
available to the clustering algorithm.
Identify the pattern proximity relative to the
data domain. Usually performed using the
Euclidean distances.
Grouping or Clustering of the data.
Data abstraction.
Assessment of output.
Creating Clusters
There are two basic approaches for
creating the clusters:
Partitional
Hierarchical
Partitional Theory
The analyst evaluates and groups the data
using statistical algorithms
The most popular methods of partitioning
include
k-means
Hierarchical agglomerative clustering
Unsupervised Bayes
Mode finding, or density based
k-means Clustering
Clusters are defined by measuring the
Euclidian distances between data points
Requires the analyst to know something
about the underlying data
The analyst needs to provide the number
of clusters to be performed. Then the
software will perform a four step iterative
process to cluster the data.
Step 1
Randomly assign the cluster center’s
position.
Step 2
Assign each data point to its nearest
“center point”
Step 3
Find the actual center of each of the new
clusters
Step 4
Place the centroid in the new position
End State
Repeat the four step process until the
cluster is optimized
Heirarchical theory
Does not generate a set of disjointed
clusters
Top-down (divisive) or bottom-up
(agglomerative) approach
The bottom up approach being more
common
Divisive Approach
Generates a hierarchy of nested clusters that
can be represented by a tree, called a
dendrogram
A dendrogram consists of many upside down Ushaped lines connecting data points in a
hierarchical tree
This method is favored by biologists because it
may give more insights into the structure of the
clusters than other methods
Dendrogram
Agglomerative Approach
Each individual data point starts by being
alone its own group
The groups closest to each are merged
with one another
This continues until all individual data
points are in one single group
Agglomerative Clustering
Step 1
Step 2
Questions?