Transcript Slide 1
College of Science & Technology
Dep. Of Computer Science & IT
BCs of Information Technology
Data Mining
Chapter 6: Clustering Methods
2013
Prepared by: Mahmoud Rafeek Al-Farra
www.cst.ps/staff/mfarra
Course’s Out Lines
2
Introduction
Data Preparation and Preprocessing
Data Representation
Classification Methods
Evaluation
Clustering Methods
Mid Exam
Association Rules
Knowledge Representation
Special Case study : Document clustering
Discussion of Case studies by students
Out Lines
3
Definition of Clustering
Why clustering?
Where to use clustering?
Next: Types of Data in Cluster Analysis
Next: A Categorization of Major Clustering
Methods
Definition of Clustering
4
Clustering can be considered the most important
unsupervised learning technique; so, as every other
problem of this kind, it deals with finding a
structure in a collection of unlabeled data.
Clustering is “the process of organizing objects into
groups whose members are similar in some way”.
A cluster is therefore a collection of objects which
are “similar” between them and are “dissimilar” to
the objects belonging to other clusters.
Definition of Clustering
5
Cluster: a collection of data objects
Similar
to one another within the same cluster
Dissimilar
Cluster analysis
Grouping
to the objects in other clusters
a set of data objects into clusters
Clustering is unsupervised
predefined classes
classification:
no
Learning
6
Why clustering?
7
Simplifications
Pattern detection
Useful in data concept construction
Unsupervised learning process
Where to use clustering?
8
Data mining
Information retrieval
text mining
Web analysis
marketing
medical diagnostic
Which method should I use?
9
Type of attributes in data
Scalability to larger dataset
Ability to work with irregular data
Time cost
complexity
Data order dependency
Result presentation
Thanks
10