Transcript slidesx

Lloyd Algorithm
K-Means Clustering
Gene Expression
• Susumu Ohno: whole
genome duplications
• The expression of genes
can be measured over
time.
• Identifying which genes
are expressed at a given
moment can help
determine function.
Grouping
• Grouping genes by derivative.
• Data must be clustered by derivative.
Clustering Problems
• Cluster d data points
into k clusters, such
that each point is
closer to the points in
its cluster than those
of any other.
• Data is usually not
that clearly
organized.
Lloyd’s Algorithm
• Assign points to clusters, minimizing distance
between points and centers of clusters.
• Assign cluster center of gravity as new center,
repeat until centers do not change, minimize
squared error distortion.
The Computational Problem
• Input: A matrix of points with dimensions m
and the desired number of clusters k.
• Output: Points organized into k clusters,
minimizing distance from center, and a visual
representation of the data.
Pseudo-pseudocode
• Arbitrarily assign k centers.
• Assign points to k clusters, minimizing
Euclidian distance from center.
• Assign cluster center of gravity as new center.
• Repeat until algorithm converges
Plotting
Plotting