Transcript Slide 1

College of Science & Technology
Dep. Of Computer Science & IT
BCs of Information Technology
Data Mining
Chapter 6: Clustering Methods
2013
Prepared by: Mahmoud Rafeek Al-Farra
www.cst.ps/staff/mfarra
Course’s Out Lines
2











Introduction
Data Preparation and Preprocessing
Data Representation
Classification Methods
Evaluation
Clustering Methods
Mid Exam
Association Rules
Knowledge Representation
Special Case study : Document clustering
Discussion of Case studies by students
Out Lines
3





Definition of Clustering
Why clustering?
Where to use clustering?
Next: Types of Data in Cluster Analysis
Next: A Categorization of Major Clustering
Methods
Definition of Clustering
4



Clustering can be considered the most important
unsupervised learning technique; so, as every other
problem of this kind, it deals with finding a
structure in a collection of unlabeled data.
Clustering is “the process of organizing objects into
groups whose members are similar in some way”.
A cluster is therefore a collection of objects which
are “similar” between them and are “dissimilar” to
the objects belonging to other clusters.
Definition of Clustering
5

Cluster: a collection of data objects
 Similar
to one another within the same cluster
 Dissimilar

Cluster analysis
 Grouping

to the objects in other clusters
a set of data objects into clusters
Clustering is unsupervised
predefined classes
classification:
no
Learning
6
Why clustering?
7

Simplifications

Pattern detection

Useful in data concept construction

Unsupervised learning process
Where to use clustering?
8

Data mining

Information retrieval

text mining

Web analysis

marketing

medical diagnostic
Which method should I use?
9







Type of attributes in data
Scalability to larger dataset
Ability to work with irregular data
Time cost
complexity
Data order dependency
Result presentation
Thanks
10