Transcript slide
Classification
Heejune Ahn
SeoulTech
Last updated 2015. May. 03
Outline
Introduction
Classification design
Purpose, type, and an example
Design flow
Simple classifier
Linear discriminant functions
Mahalanobis distance
Bayesian classification
K-means clustering : unsupervised learning
1.Pupose
Purpose
For decision making
Topics of Pattern recognition (in artificial intelligence)
Model
Images
Features
(patterns, structures)
Classifier
(classification rules)
Automation and Human intervention
Task specification: what classes, what features
Algorithm to used
Training: tuning algorithm parameters
classes
2. Supervised vs unsupervised
Supervised (classification)
trained by examples (by humans)
Unsupervised (clustering)
only by feature data
using the mathematical properties (statistics) of data
set
3. An example
Classifying nuts
Features
(circularity,
line-fit-error)
Classifier
(classification rules)
Pine-nuts
Lentils
Pumpkin seeds
pine nut
lentil
pumpkin seed
Observations
What if a single features used?
What for the singular points?
Classification
draw boundaries
Terminalogy
4. Design Flow
5. Prototypes & min-distance classifier
Prototypes
mean of training samples in each class
6. Linear discriminant
Linear discriminant function
g(x1,x2) = a*x1 + b*x2 + c = 0
Ex 11.1 & Fig11.6
8. Mahalanobis distance
Problems In min-dist.
mean-value only, no
distribution considered
e.g. (right figure)
std(class 1) << std(class 2)
Mahalanobis dist.
Variance considered.
(larger variance, less distance)
9. Bayesian classification
Idea
To assign each data to the “most-probable” class,
based on “apriori-known probability”
Assumption
Priors (probability for class) are known.
Bayes theorem
10. Bayes decision rule
Classification rule
Intuitively
Bayes Theorem
Class-conditional probability density function
Prior probability
Total probability & Not used in classification decision
Interpretation
Need to know priors and class-conditional pdf:
often not available
MVN (multivariate normal) distribution model
Practically quite good approximation
MVN
N-D Normal distribution with
12. Bayesian classifier for M-varirates
taking log( )
It is monotonic increasing function
Case 1: identical independent
• Linear Machine: the decision region is hyper-plane (linears)
• Note: when same prob(w), then Minimum distance criterion
Case 2: all covariance is same:
Matlab
[class, err] = classify(test, training, group[, type,
prior])
training and test
Type ‘DiagLinear’ for naïve Baysian
Ex11.3
TRAINING DATA
TEST DATA
5
5
0
0
-5
-6
-4
-2
0
2
4
-5
-6
5
5
0
0
-5
-6
-4
-2
0
2
wrong priors
4
-5
-6
-4
-2
0
2
4
-4
-2
0
2
4
correct priors
13. Ensemble classifier
Combining multiple classifiers
Utilizing diversity, similar to ask multiple experts
for decision.
AdaBoost
Weak classifier: change (1/2) < accuracy << 1.0
weighting mis-classified training data for next
classifiers
D1(x)
uniform
H1(x)
D2(x)
at(x)
H2(x)
D2(x)
DT(x)
Ht(x)
HT(x)
AdaBoost in details
Given:
Initialize weight:
For t = 1, . . ., T:
1.
2.
3.
WeakLearn, which return the weak classifier
with minimum error w.r.t. distribution Dt
Choose
Update
Where Zt is a normalization factor chosen so that Dt+1 is a
distribution
Output the strong classifier:
14. K-means clustering
K-means
Unsupervised classification
Group data to minimize
Iterative algorithm
(re-)assign Xi’s to class
(re-)calculate ci
Demo
http://shabal.in/visuals/kmeans/3.html
Issues
Sensitive to “initial” centroid values.
‘K’ (# of clusters) should be given.
Multiple trials needed => choose the best one
Trade-off in K (bigger) and the objective function
(smaller)
No optimal algorithm to determine it.
Nevertheless
used in most of un-supervised clustering now.
Ex11.4 & F11.10
kmeans function
[classIndexes,
centers] =
kmeans(data, k,
options)
k : # of clusters
Options: ‘Replicates',
‘Display’