Pattern Recognition: Bayes - Seidenberg School of Computer
Download
Report
Transcript Pattern Recognition: Bayes - Seidenberg School of Computer
Pattern Recognition:
Baysian Decision Theory
Charles Tappert
Seidenberg School of CSIS, Pace University
Pattern
Classification
Most of the material in these slides was
taken from the figures in
Pattern Classification (2nd ed) by R. O.
Duda, P. E. Hart and D. G. Stork, John
Wiley & Sons, 2001
Baysian Decision Theory
Fundamental pure statistical approach
Assumes relevant probabilities are
known perfectly
Makes theoretically optimal decisions
Baysian Decision Theory
Based on Bayes formula
P(j| x) = p(x | j) P(j) / p(x)
which is easily derived from writing the joint
probability density two ways
P(j , x) = P(j|x) p(x)
P(j , x) = p(x| ) p( )
j
j
Note: uppercase P(.) denotes a probability mass function and lowercase p(.) a density function
Bayes Formula
Bayes formula
P(j| x) = p(x | j) P(j) / p(x)
can be expressed informally in English as
posterior = likelihood x prior / evidence
and Bayes decision chooses the class j with the
greatest posterior probability
Bayes Formula
Bayes formula: P(j| x) = p(x | j) P(j) / p(x)
Bayes decision chooses class j with the greatest
P(j| x)
Since p(x) is the same for all classes, greatest
P(j| x) means greatest p(x | j) P(j)
Special case: if all classes are equally likely, i.e.
same P(j), we get a further simplification –
greatest P(j| x) is greatest likelihood p(x | j)
Baysian Decision Theory
Now, let’s look at the fish example of
two classes – sea bass and salmon –
and one feature – lightness
Let p(x | 1) and p(x | 2) describe
the difference in lightness between
populations of sea bass and salmon
(see next slide)
Baysian Decision Theory
In the previous slide, if the two classes are
equally likely, we get the simplification –
greatest posterior means greatest
likelihood, and Bayes decision is to choose
class 1 when p(x | 1) > p(x | 2), i.e.
when lightness is > approximately 12.4
However, if the two classes are not equally
likely, we get a case like the next slide
Baysian Parameter Estimation
Because the actual probabilities are rarely
known, they are usually estimated after
assuming the form of the distributions
The usually assumed form of the
distributions is multivariate normal
Baysian Parameter Estimation
Assuming multivariate normal probability
density functions, it is necessary to
estimate for each pattern class
Feature means
Feature covariance matrices
Multivariate Normal Densities
Simplifying assumptions can be made for
multivariate normal density functions
Statistically independent features with equal
variances yields hyperplane decision surfaces
Equal covariance matrices for each class also
yields hyperplane decision surfaces
Arbitrary normal distributions yields
hyperquadric decision surfaces
Nonparametric Techniques
Probabilities are not known
Two approaches
Estimate the density functions from
sample patterns
Bypass probability estimation entirely
Use a non-parametric method
Such as k-Nearest-Neighbor
k-Nearest-Neighbor
k-Nearest-Neighbor (k-NN) Method
Used where probabilities are not known
Bypasses probability estimation entirely
Easy to implement
Asymptotic error never worst than twice
Baysian error
Computationally intense, therefore slow
Simple PR System with k-NN
Good for feasibility studies – easy to implement
Typical procedural steps
Extract feature measurements
Normalize features to 0-1 range
x xmin
x'
xmax xmin
Classify by k nearest neighbor
Using Euclidean distance
Simple PR System with k-NN (cont):
Two Modes of Operation
Leave-one-out procedure
One input file of training/test patterns
Repeatedly train on all samples except one
which is left for testing
Good for feasibility study with little data
Train and test on separate files
One input file for training and one for testing
Good for measuring performance change when
varying an independent variable (e.g., different
keyboards for keystroke biometric)
Simple PR System with k-NN (cont)
Used in keystroke biometric studies
Used in other studies that used keystroke data
Feasibility study – Dr. Mary Curtin
Different keyboards/modes – Dr. Mary Villani
Study of procedures for handling incomplete and
missing data – e.g., fallback procedures in the
keystroke biometric system – Dr. Mark Ritzmann
New kNN-ROC procedures – Dr. Robert Zack
Used in other biometric studies
Mouse movement – Larry Immohr
Stylometry + keystroke study – John Stewart
Conclusions
Bayes decision method best if probabilities known
Bayes method okay if you are good with statistics
and the form of the probability distributions can be
assumed, especially if there is justification for
simplifying assumptions like independent features
Otherwise, stay with easier to implement methods
that provide reasonable results, like k-NN