Transcript DHSch2part2

0
Pattern
Classification
All materials in these slides were taken
from
Pattern Classification (2nd ed) by R. O.
Duda, P. E. Hart and D. G. Stork, John Wiley
& Sons, 2000
with the permission of the authors and
the publisher
Pattern Classification, Chapter 2 (Part 2)
Chapter 2 (Part 2):
Bayesian Decision Theory
(Sections 2.3-2.5)
• Minimum-Error-Rate Classification
• Classifiers, Discriminant Functions and Decision Surfaces
• The Normal Density
2
Minimum-Error-Rate Classification
• Actions are decisions on classes
If action i is taken and the true state of nature is j then:
the decision is correct if i = j and in error if i  j
• Seek a decision rule that minimizes the probability
of error which is the error rate
Pattern Classification, Chapter 2 (Part 2)
3
• Introduction of the zero-one loss function:
0 i  j
 (  i , j )  
1 i  j
i , j  1 ,..., c
Therefore, the conditional risk is:
j c
R(  i | x )    (  i |  j )P (  j | x )
j 1
  P(  j | x )  1  P(  i | x )
j 1
“The risk corresponding to this loss function is the
average probability error”
Pattern Classification, Chapter 2 (Part 2)
4
• Minimize the risk requires maximize P(i | x)
(since R(i | x) = 1 – P(i | x))
• For Minimum error rate
• Decide i if P (i | x) > P(j | x) j  i
Pattern Classification, Chapter 2 (Part 2)
5
• Regions of decision and zero-one loss function,
therefore:
12   22 P (  2 )
P( x | 1 )
Let
.
   then decide  1 if :
 
21  11 P (  1 )
P( x |  2 )
• If  is the zero-one loss function wich means:
 0 1

  
1 0
P(  2 )
then   
 a
P( 1 )
0 2 
2 P(  2 )
 then   
if   
 b
P( 1 )
1 0
Pattern Classification, Chapter 2 (Part 2)
6
Pattern Classification, Chapter 2 (Part 2)
Classifiers, Discriminant Functions
and Decision Surfaces
7
• The multi-category case
• Set of discriminant functions gi(x), i = 1,…, c
• The classifier assigns a feature vector x to class i
if:
gi(x) > gj(x) j  i
Pattern Classification, Chapter 2 (Part 2)
8
Pattern Classification, Chapter 2 (Part 2)
• Let gi(x) = - R(i | x)
9
(max. discriminant corresponds to min. risk!)
• For the minimum error rate, we take
gi(x) = P(i | x)
(max. discrimination corresponds to max.
posterior!)
gi(x)  P(x | i) P(i)
gi(x) = ln P(x | i) + ln P(i)
(ln: natural logarithm!)
Pattern Classification, Chapter 2 (Part 2)
10
• Feature space divided into c decision regions
if gi(x) > gj(x) j  i then x is in Ri
(Ri means assign x to i)
• The two-category case
• A classifier is a “dichotomizer” that has two discriminant
functions g1 and g2
Let g(x)  g1(x) – g2(x)
Decide 1 if g(x) > 0 ; Otherwise decide 2
Pattern Classification, Chapter 2 (Part 2)
11
• The computation of g(x)
g( x )  P (  1 | x )  P (  2 | x )
P( x | 1 )
P( 1 )
 ln
 ln
P( x |  2 )
P(  2 )
Pattern Classification, Chapter 2 (Part 2)
12
Pattern Classification, Chapter 2 (Part 2)
13
The Normal Density
• Univariate density
• Density which is analytically tractable
• Continuous density
• A lot of processes are asymptotically Gaussian
• Handwritten characters, speech sounds are ideal or prototype
corrupted by random process (central limit theorem)
P( x ) 
2

1
1 x 
exp   
 ,
2 
 2    
Where:
 = mean (or expected value) of x
2 = expected squared deviation or variance
Pattern Classification, Chapter 2 (Part 2)
14
Pattern Classification, Chapter 2 (Part 2)
15
• Multivariate density
• Multivariate normal density in d dimensions is:
P( x ) 
1
( 2 )
d/2

1/ 2
 1

t
1
exp  ( x   )  ( x   )
 2

where:
x = (x1, x2, …, xd)t (t stands for the transpose vector form)
 = (1, 2, …, d)t mean vector
 = d*d covariance matrix
|| and -1 are determinant and inverse respectively
Pattern Classification, Chapter 2 (Part 2)