Comparing Classification Methods

Download Report

Transcript Comparing Classification Methods

College of Science & Technology
Dep. Of Computer Science & IT
BCs of Information Technology
Data Mining
Chapter 4_2: Classification Methods
(Examples)
2013
Prepared by: Mahmoud Rafeek Al-Farra
www.cst.ps/staff/mfarra
Course’s Out Lines
2











Introduction
Data Preparation and Preprocessing
Data Representation
Classification Methods
Evaluation
Clustering Methods
Mid Exam
Association Rules
Knowledge Representation
Special Case study : Document clustering
Discussion of Case studies by students
Out Lines
3

Comparing Classification Methods

Machine learning techniques
 Decision
Trees
 k-Nearest
 Naïve

Neighbors
Bayesian Classifiers
Neural Networks
Comparing Classification Methods
4

Predictive Accuracy: Ability to correctly predict
the class label.

Speed: Computation costs involved in generating
and using model

Robustness: Ability to make correct predictions
given noisy or/and missing values
Comparing Classification Methods
5

Scalability: Ability to construct model efficiently
given large amounts of data

Interpretability: Level of understanding and
insight that is provided by the model.
Machine learning techniques
6

Learning:
 Things
learn when they change their behavior in a way
that makes them perform better in the future.

Machine learning
 is
the subfield of artificial intelligence that is
concerned with the design and development of
algorithms that allow computers (machines) to
improve their performance over time (to learn)
based on data, such as from sensor data or databases
Machine learning techniques
7

Examples of machine learning techniques:
 Decision
Trees
 k-Nearest
 Naïve
Neighbors
Bayesian Classifiers
Decision Trees
8

Decision tree learning is a common method used
in data mining.

It is an efficient method for producing classifiers
from data.

A Decision Tree is a tree-structured plan of a set
of attributes to test in order to predict the output.
Decision Trees
9
Decision tree consist of:
10





An internal node is a test on an attribute, e.g.
Body temperature .
A branch represents an outcome of the test, e.g.,
Warm
A leaf node represents a class label e.g. Mammals
At each node, one attribute is chosen to split
training examples into distinct classes as much as
possible
A new case is classified by following a matching
path to a leaf node.
Decision tree consist of:
11
Weather Data: Play or not Play?
12
Outlook
Temperature
Humidity
Windy
Play?
sunny
hot
high
false
No
sunny
hot
High
true
No
overcast
hot
high
false
Yes
rain
mild
high
false
Yes
rain
cool
normal
false
Yes
rain
cool
normal
true
No
overcast
cool
normal
true
Yes
sunny
mild
high
false
No
sunny
cool
normal
false
Yes
rain
mild
normal
false
Yes
sunny
mild
normal
true
Yes
overcast
mild
high
true
Yes
overcast
hot
normal
false
Yes
rain
mild
high
true
No
Weather Data: Play or not Play?
13
Outlook
sunny
Case Study:
How To Build a tree?
rain
overcast
Humidity
Yes
Windy
high
normal
true
false
No
Yes
No
Yes
How To Build a tree?
14

Top-down tree construction

Which is the best attribute?

….
Next …
15
 k-Nearest
 Naïve
Neighbors
Bayesian Classifiers
Thanks
16