Classification
Download
Report
Transcript Classification
Category Recognition
1
Associating information extracted from images with
categories (classes) of objects
Requires prior knowledge about the objects (models)
Requires compatible representation of model with data
Requires appropriate reasoning techniques
Reasoning techniques include:
Classification (supervised & unsupervised)
Graph matching
Rule-based processing
Hybrid techniques
Ellen L. Walker
Knowledge Representation
2
Syntax = symbols and how they are used
Semantics = meanings of symbols & their arrangement
Representation = syntax + semantics
Ellen L. Walker
Types of representations
Feature vectors:
Grammars:
3
Long (x) and thin(x) -> road(x)
Production rules :
person => head+trunk+legs
Predicate logic:
[area=200, eccentricity=1, ...]
if R is long and R is thin then R is a road segment
Graphs
Ellen L. Walker
Classification
Feature-based object recognition
Unknown object is represented by a feature vector
Known objects are also represented by feature vectors
Grouped into classes
4
e.g height, weight
Class = set of objects that share important properties
Reject class = generic class for all unidentifiable objects
Classification = assigning the unknown object the label of
the appropriate class
Ellen L. Walker
Types of Classification
Discriminant classification (supervised)
Distance classification (unsupervised)
5
Create dividing lines (discriminants) to separate classes
based on (positive and negative) examples
Create clusters in feature space to collect items of the
same class
A priori knowledge = prespecified discriminant functions
or cluster centers
Ellen L. Walker
Classification Systems
6
Pre-production (training data)
Extract relevant features from training examples of each
class (feature vectors)
Construct (by hand) or use machine learning to develop
discrimination functions to correctly classify training
examples
Production (test data and real data)
Extract a feature vector from the image
Apply the discrimination functions determined in
preproduction to determine the closest class to the object
Report the result (label) of the object
Ellen L. Walker
Evaluating Classification Systems
Classification error = object classified into wrong class
False positive = item identified as class, should be not-class
False negative = item identified as not-class, should be class
Increasing sensitivity to true positives often increases false
negatives as well
True Positive rate (desired value: 1)
False Positive rate (desired value: 0)
7
Number of true positives / total number of positives
Number of false positives / total number of negatives
Errors are measured on independent test data - these data have
known classifications, but are not used in any way in the
development (pre-production stage) of the system
Ellen L. Walker
Discrimination functions
8
Let g(x) be “goodness” of x as a member of class g
Discrimination function between g1 and g2 is simply
g1(x) – g2(x) = 0 (i.e. both classes are equally good on
the dividing line)
An object’s class is the “g” that gives the largest value
for x
Linear functions are often used for g(x)
With one example/class, this reduces to nearest mean
Perceptrons represent linear discrimination functions (see
NN notes)
Ellen L. Walker
Nearest Mean
Let g(x) be distance of x from the average of all training
objects in g
Compute Euclidean distance:
||x2-x1|| = sqrt(sum over all dimensions(x2[d]-x1[d])
E.g. Sqrt((height difference)2 + (weight difference)2 )
Works beautifully if classes are well separated and
compact
9
But consider a "horizontal class" or a "vertical class" !
Ellen L. Walker
Scaled Distance
Scaling the distance based on the ”shape" of the class can
help (variance in each dimension)
Variance is the square of distances of all related points
from the mean
n
s
2
i1
(X i X) 2
(n 1)
In one dimension, we can measure “Standard
Deviations,” i.e.
2
X X
s2
10
Ellen L. Walker
Mahalanobis Distance
In multiple dimensions, we have a covariance matrix.
A Covariance Matrix is a square matrix for describing
the relationship among features in a feature vector
N
M ij X ik X i X jk X j
k1
11
Mahalanobis Distance effectively multiplies by the
inverse of the Covariance Matrix
DX,Y (X Y)M (X Y)
1
Ellen L. Walker
Nearest Neighbor
Save the vectors for all the training examples (instead of
just the mean for each class)
Result of classification of a test vector is the class of the
nearest neighbor in the training set
Extension - let k nearest neighbors "vote"
Can easily accommodate overlapping and oddly shaped
classes (e.g. dumbbell shape)
More costly than nearest mean because of more
comparisons (use tree data structures to help)
Highly dependent on choices and number of training
examples
12
Ellen L. Walker
Statistical Method
Minimum error criterion -- minimize probability that a
new element will be misclassified (need to know prior
probabilities of feature vector elements & combinations)
Correct class is the one that maximizes (over all
classes) P(class|vector)
P(class|vector) = P(vector|class)P(class) / P(vector) -Bayes’ rule
13
Ellen L. Walker
Decision Trees
Each node is a question
Each leaf is a decision
hair?
Pet?
Lion
14
Cat
legs?
frog
snake
Ellen L. Walker
Decision Trees
Build a classification tree to classify the training set
Each branch in the tree denotes a comparison &
decision process
Each leaf of the tree is a classification
Make the tree as “balanced” as possible!
The branches in the tree represent (parts of)
discriminant functions - you can classify an unknown
object by walking the tree!
Can be constructed by hand or by algorithm
15
Ellen L. Walker
Automatic Construction of Decision Tree
Use the idea of information content - which feature gives
the most information to divide the existing data at that
node
At the root: which feature contributes the most
information to a class?
16
If all elements of a feature lead to a class, that is the most
information
At a node: given the subset of features remaining based
on decisions made, which contributes the most
information?
Ellen L. Walker
Clustering
No training set needed!
Hierarchical clustering: recursively divide data into most
different (non-overlapping) subsets
Non-hierarchical methods: divide data directly among
some (given?) number of clusters
17
K-means clustering
Fuzzy C-means clustering
Clustering to special shapes, e.g. shell clustering
Ellen L. Walker