Classification

Download Report

Transcript Classification

Category Recognition


1
Associating information extracted from images with
categories (classes) of objects

Requires prior knowledge about the objects (models)

Requires compatible representation of model with data

Requires appropriate reasoning techniques
Reasoning techniques include:

Classification (supervised & unsupervised)

Graph matching

Rule-based processing

Hybrid techniques
Ellen L. Walker
Knowledge Representation
2

Syntax = symbols and how they are used

Semantics = meanings of symbols & their arrangement

Representation = syntax + semantics
Ellen L. Walker
Types of representations

Feature vectors:


Grammars:


3
Long (x) and thin(x) -> road(x)
Production rules :


person => head+trunk+legs
Predicate logic:


[area=200, eccentricity=1, ...]
if R is long and R is thin then R is a road segment
Graphs
Ellen L. Walker
Classification

Feature-based object recognition

Unknown object is represented by a feature vector


Known objects are also represented by feature vectors

Grouped into classes



4
e.g height, weight
Class = set of objects that share important properties
Reject class = generic class for all unidentifiable objects
Classification = assigning the unknown object the label of
the appropriate class
Ellen L. Walker
Types of Classification

Discriminant classification (supervised)


Distance classification (unsupervised)


5
Create dividing lines (discriminants) to separate classes
based on (positive and negative) examples
Create clusters in feature space to collect items of the
same class
A priori knowledge = prespecified discriminant functions
or cluster centers
Ellen L. Walker
Classification Systems


6
Pre-production (training data)

Extract relevant features from training examples of each
class (feature vectors)

Construct (by hand) or use machine learning to develop
discrimination functions to correctly classify training
examples
Production (test data and real data)

Extract a feature vector from the image

Apply the discrimination functions determined in
preproduction to determine the closest class to the object

Report the result (label) of the object
Ellen L. Walker
Evaluating Classification Systems


Classification error = object classified into wrong class

False positive = item identified as class, should be not-class

False negative = item identified as not-class, should be class

Increasing sensitivity to true positives often increases false
negatives as well
True Positive rate (desired value: 1)


False Positive rate (desired value: 0)


7
Number of true positives / total number of positives
Number of false positives / total number of negatives
Errors are measured on independent test data - these data have
known classifications, but are not used in any way in the
development (pre-production stage) of the system
Ellen L. Walker
Discrimination functions
8

Let g(x) be “goodness” of x as a member of class g

Discrimination function between g1 and g2 is simply
g1(x) – g2(x) = 0 (i.e. both classes are equally good on
the dividing line)

An object’s class is the “g” that gives the largest value
for x

Linear functions are often used for g(x)

With one example/class, this reduces to nearest mean

Perceptrons represent linear discrimination functions (see
NN notes)
Ellen L. Walker
Nearest Mean

Let g(x) be distance of x from the average of all training
objects in g

Compute Euclidean distance:


||x2-x1|| = sqrt(sum over all dimensions(x2[d]-x1[d])

E.g. Sqrt((height difference)2 + (weight difference)2 )
Works beautifully if classes are well separated and
compact

9
But consider a "horizontal class" or a "vertical class" !
Ellen L. Walker
Scaled Distance

Scaling the distance based on the ”shape" of the class can
help (variance in each dimension)

Variance is the square of distances of all related points
from the mean


n
s

2
i1
(X i  X) 2
(n 1)
In one dimension, we can measure “Standard
Deviations,” i.e.
2

X  X 
s2
10
Ellen L. Walker

Mahalanobis Distance

In multiple dimensions, we have a covariance matrix.

A Covariance Matrix is a square matrix for describing
the relationship among features in a feature vector
N


M ij   X ik  X i X jk  X j

k1


11
Mahalanobis Distance effectively multiplies by the
inverse of the Covariance Matrix
DX,Y   (X Y)M (X Y)
1
Ellen L. Walker
Nearest Neighbor

Save the vectors for all the training examples (instead of
just the mean for each class)

Result of classification of a test vector is the class of the
nearest neighbor in the training set

Extension - let k nearest neighbors "vote"

Can easily accommodate overlapping and oddly shaped
classes (e.g. dumbbell shape)

More costly than nearest mean because of more
comparisons (use tree data structures to help)

Highly dependent on choices and number of training
examples
12
Ellen L. Walker
Statistical Method

Minimum error criterion -- minimize probability that a
new element will be misclassified (need to know prior
probabilities of feature vector elements & combinations)

Correct class is the one that maximizes (over all
classes) P(class|vector)

P(class|vector) = P(vector|class)P(class) / P(vector) -Bayes’ rule
13
Ellen L. Walker
Decision Trees

Each node is a question

Each leaf is a decision
hair?
Pet?
Lion
14
Cat
legs?
frog
snake
Ellen L. Walker
Decision Trees

Build a classification tree to classify the training set

Each branch in the tree denotes a comparison &
decision process

Each leaf of the tree is a classification

Make the tree as “balanced” as possible!

The branches in the tree represent (parts of)
discriminant functions - you can classify an unknown
object by walking the tree!

Can be constructed by hand or by algorithm
15
Ellen L. Walker
Automatic Construction of Decision Tree

Use the idea of information content - which feature gives
the most information to divide the existing data at that
node

At the root: which feature contributes the most
information to a class?


16
If all elements of a feature lead to a class, that is the most
information
At a node: given the subset of features remaining based
on decisions made, which contributes the most
information?
Ellen L. Walker
Clustering

No training set needed!

Hierarchical clustering: recursively divide data into most
different (non-overlapping) subsets

Non-hierarchical methods: divide data directly among
some (given?) number of clusters
17

K-means clustering

Fuzzy C-means clustering

Clustering to special shapes, e.g. shell clustering
Ellen L. Walker