Transcript slide

Classification
Heejune Ahn
SeoulTech
Last updated 2015. May. 03
Outline

Introduction


Classification design






Purpose, type, and an example
Design flow
Simple classifier
Linear discriminant functions
Mahalanobis distance
Bayesian classification
K-means clustering : unsupervised learning
1.Pupose

Purpose



For decision making
Topics of Pattern recognition (in artificial intelligence)
Model
Images

Features
(patterns, structures)
Classifier
(classification rules)
Automation and Human intervention



Task specification: what classes, what features
Algorithm to used
Training: tuning algorithm parameters
classes
2. Supervised vs unsupervised

Supervised (classification)


trained by examples (by humans)
Unsupervised (clustering)


only by feature data
using the mathematical properties (statistics) of data
set
3. An example

Classifying nuts
Features
(circularity,
line-fit-error)
Classifier
(classification rules)
Pine-nuts
Lentils
Pumpkin seeds
pine nut
lentil
pumpkin seed

Observations



What if a single features used?
What for the singular points?
Classification

draw boundaries

Terminalogy
4. Design Flow
5. Prototypes & min-distance classifier

Prototypes

mean of training samples in each class
6. Linear discriminant

Linear discriminant function

g(x1,x2) = a*x1 + b*x2 + c = 0


Ex 11.1 & Fig11.6
8. Mahalanobis distance

Problems In min-dist.


mean-value only, no
distribution considered
e.g. (right figure)


std(class 1) << std(class 2)
Mahalanobis dist.
Variance considered.
(larger variance, less distance)
9. Bayesian classification

Idea


To assign each data to the “most-probable” class,
based on “apriori-known probability”
Assumption


Priors (probability for class) are known.
Bayes theorem
10. Bayes decision rule

Classification rule
Intuitively
Bayes Theorem
Class-conditional probability density function
Prior probability
Total probability & Not used in classification decision

Interpretation


Need to know priors and class-conditional pdf:
often not available
MVN (multivariate normal) distribution model


Practically quite good approximation
MVN

N-D Normal distribution with
12. Bayesian classifier for M-varirates
taking log( )
It is monotonic increasing function

Case 1: identical independent
• Linear Machine: the decision region is hyper-plane (linears)
• Note: when same prob(w), then Minimum distance criterion

Case 2: all covariance is same:

Matlab

[class, err] = classify(test, training, group[, type,
prior])


training and test
Type ‘DiagLinear’ for naïve Baysian

Ex11.3
TRAINING DATA
TEST DATA
5
5
0
0
-5
-6
-4
-2
0
2
4
-5
-6
5
5
0
0
-5
-6
-4
-2
0
2
wrong priors
4
-5
-6
-4
-2
0
2
4
-4
-2
0
2
4
correct priors
13. Ensemble classifier

Combining multiple classifiers


Utilizing diversity, similar to ask multiple experts
for decision.
AdaBoost


Weak classifier: change (1/2) < accuracy << 1.0
weighting mis-classified training data for next
classifiers
D1(x)
uniform
H1(x)
D2(x)
at(x)
H2(x)
D2(x)
DT(x)
Ht(x)
HT(x)

AdaBoost in details



Given:
Initialize weight:
For t = 1, . . ., T:
1.
2.
3.
WeakLearn, which return the weak classifier
with minimum error w.r.t. distribution Dt
Choose
Update
Where Zt is a normalization factor chosen so that Dt+1 is a
distribution

Output the strong classifier:
14. K-means clustering

K-means

Unsupervised classification
Group data to minimize

Iterative algorithm




(re-)assign Xi’s to class
(re-)calculate ci
Demo

http://shabal.in/visuals/kmeans/3.html

Issues

Sensitive to “initial” centroid values.


‘K’ (# of clusters) should be given.



Multiple trials needed => choose the best one
Trade-off in K (bigger) and the objective function
(smaller)
No optimal algorithm to determine it.
Nevertheless

used in most of un-supervised clustering now.

Ex11.4 & F11.10


kmeans function
[classIndexes,
centers] =
kmeans(data, k,
options)


k : # of clusters
Options: ‘Replicates',
‘Display’