Transcript Lecture9

Nanjing University of Science & Technology
Pattern Recognition:
Statistical and Neural
Lonnie C. Ludeman
Lecture 9
Sept 28, 2005
1
Review 1: Classifier performance Measures
1. A’Posteriori Probability (Maximize)
2. Probability of Error ( Minimize)
3. Bayes Average Cost (Maximize)
4. Probability of Detection ( Maximize with fixed
Probability of False alarm)
(Neyman Pearson Rule)
5. Losses (Minimize the maximum)
2
Review 2: MAP, MPE , and Bayes Classification
Rule
C1
If
l( x ) >
N
<
Likelihood ratio
NMAP =
P(C2)
P(C1)
NBAYES =
C2 Threshold
NMPE =
P(C2)
P(C1)
(C22 - C12 ) P(C2)
(C11 - C21 ) P(C1)
3
Review 3. General Calculation of Probability
of Error
F2 decide C2
F1 decide C1
R1 decide C1
y = g(x)
y
Feature Space
x
p(x | C1)
L( x ) = p(x | C2)
Pattern Space
R2 decide C2
N = Threshold
0
L1 decide C1
L1 decide C1
Likelihood Ratio Space
4
Topics for Lecture 9
1. Neyman Pearson Decision Rule
Receiver Operating Characteristic(ROC)
2. M-Class Case MAP Decision Rule
3. M-Class Case MPE Decision Rule
4. M-Class Bayes Decision Rule
5
Motivation:
Falling
Rock
Small probability of a falling rock
Difficult to assign realistic costs
to consequences
Very High cost to not detect
Low cost for false alarm
6
Definitions:
P(decide target | target)
Detection
P(decide no target | target)
Miss
P(decide target | no target)
False Alarm
P(decide no target | no target ) Correct Dismissal
7
Neyman Pearson Classifier- 2 Classes
A. Assumptions:
C1: (target), known p(x | C1)
C2: (no Target),known p(x | C2)
No Apriori probabilities specified
Acceptable False Alarm rate specified
No cost assignment available specified
8
B. Performance: Probability of Detection and
Probability of False Alarm
PD = P(decide target | target is present)
PFA = P(decide target | when target
is NOT present)
C Decision rule: Want to Maximize the
probability of Detection for an acceptable
False alarm rate
9
Neyman Pearson Decision Rule – rough derivation
PD= P (decide target | target) =
= 1 - PM
p(x | C1 ) dx
R1
PFA= P( decide target | no target) =
<
PFA=
p(x | C2 ) dx
R1
0
Acceptable false alarm rate
Use Lagrangian Multipliers to minimize J as follows
J = PM +
(PFA -
0
)
10
Neyman Pearson Decision Rule – rough derivation
J= 1-
=1-
p(x | C1 ) dx + (
0
p(x | C2 ) dx -
R1
R1
+
[ - p(x | C1 ) + p(x | C2 ) ] dx
)
0
R1
To minimize J we select x to be in R1 if the term
in [ … ] is negative.
x is assigned to R1 if
- p(x | C1 ) + p(x | C2 ) < 0
which can be rearranged as follows
11
Neyman Pearson Decision Rule
If
p(x | C1 )
p(x | C2 )
C1
>
<C
= NNP
2
where is the solution of the constraining
equation
0
=
p(x | C2 ) dx
R1( )
12
PD
1
Receiver Operating Characteristic (ROC)
Slope = NNP
pD
Always Say
Target
(pD , pFA )
Operating Point
0
Always Say
NO Target
pFA
1 PFA
13
Extension of MAP, MPE, & Bayes to M Classes
Shorthand Notation for M Class case:
C1 : x ~ p(x | C1) , P(C1)
C2 : x ~ p(x | C2) , P(C2)
CM : x ~ p(x | CM) , P(CM)
14
1. Maximum A’Posteriori Classification Rule
(M Class Case )
A. Basic Assumptions:
Know : Conditional Probability Density functions
pX( x | C1), pX( x | C2 ), … , pX( x | CM ).
Know : A’Priori Probabilities
P( C1 ), P( C2 ), … , P( CM )
B. Performance Measure: A’posteriori Probability
P( Ci | x )
15
1. Maximum A’Posteriori Classification Rule
(M Class Case )
C. Decision Rule for an observed vector x, Selects
class with Maximum Aposterioi Probabability.
if P(Ci | x) > P(Cj | x ) for all j = 1, 2, … , M
j=i
then decide x from C1
if equality then decide x from the boundary
classes by random choice
16
Derivation of MAP Decision Rule
Determine for i = 1, 2, … , M the
aposteriori probabilities P(Ci | x )
Use One form of Bayes Theorem
P(Ci | x ) = p( x | Ci ) P(Ci ) / p( x )
Substitute the above for the P(Ci | x) to give
p( x | Ci ) P(Ci ) / p( x ), i = 1, 2, … , M
But p( x ) is the same for all terms so the
decision rule simplifies to
17
MAP Decision Rule for an observed vector x
Select class Ci
if p( x | Ci ) P(Ci ) > p( x | Cj ) P(Cj )
for all j = 1, 2, … , M
j=i
if equality then decide x from the boundary
classes by random choice
18
2. Minimum Probability of Error
Classification Rule (M Class Case )
A. Basic Assumptions:
Known conditional probability density functions
p(x | C1), p(x | C2), … , p(x | CM)
Known a’priori probabilities
P(C1), P(C2), … , P(CM)
B. Performance: (Total Probability of Error)
P(error) = p(error | C1) P(C1) + P(error | C2) P(C2)
+ … + P(error | CM) P(CM)
C. Decision Rule: Minimizes P(error)
19
2. Derivation: Minimum Probability of Error
Classification Rule (M Class Case )
Selects decision regions such that P(error) is minimized
Decide C1
Decide CM
But
R1
RM
Decide C2
R
2
Ri
Pattern Space X
P(error) = 1 – P(correct)
Decide Ci
where
P(correct) = P(correct | C1) P(C1) + P(correct | C2) P(C2)
+ … + P(correct | CM) P(CM)
20
P(correct | C1) = P (decide C1 | C1)
=
Derivation
Continued
p(x | C1 ) dx
R1
P(correct | Ck) = P (decide Ck | Ck)
=
p(x | Ck ) dx
…
Rk
P(correct | CM) = P (decide CM | CM)
=
p(x | CM ) dx
RM
21
Derivation Continued
M
P(error) = 1 -
p(x | Ck )P(Ck) dx
k=1
Rk
The Minimum Probability of error decision rule
selects Rk k=1, 2, … , M such that the P(errror)
is minimized.
By selecting x to be a member of Rk if the term
p(x | Ck )P(Ck) is the MAXIMUM all others
we will minimize P(errror).
22
Thus MPE Decision Rule for an observed vector x
Select class Ck
if p( x | Ck ) P(Ck ) > p( x | Cj ) P(Cj )
for all j = 1, 2, … , M
j=k
if equality then decide x from the boundary
classes by random choice
23
Bayes Classifier- (M Class Case)
A: Statistical Assumptions
CM :
Classes
…
Ck :
…
Known:
C1 :
C2 :
x ~ p(x | C1) ,
x ~ p(x | C2) ,
P(C1)
P(C2)
x ~ p(x | Ck) ,
P(Ck)
x ~ p(x | CM) ,
Observed
Pattern
Vector
Conditional
Probability
Density
Functions
P(CM)
A’Priori
Probabilities
24
Bayes Classifier - Cost definitions
Define Costs associated with decisions:
C11 , C12 , … , C1M
…
C21 , C22 , … , C2M
CM1 , CM2 , … , CMM
Where C i j = the cost associated
with deciding Class C i when true
class Class C
j
25
Bayes Classifier - Risk Definition M-Class Case
Risk is defined as the average cost
associated with making a decision.
M M
R = Risk =
i=1 j=1
P(decide Ci | Cj) P(Cj) Cij
P(decide Ck | Cj ) =
p(x | Cj ) dx
Rk
26
Derivation Continued
M
Risk =
C1j p(x | Cj) P(Cj)
R1 j=1
M
C2j p(x | Cj) P(Cj)
+
…
R2 j=1
M
+
CMj p(x | Cj) P(Cj)
RM j=1
27
Bayes Decision Rule: M-Class Case
M
yi(x) =
j=1
Cij p(x | Cj) P(Cj)
To MINIMIZE risk we would
assign x to the region Ri if
yi(x) < yj(x) for all j = i
28
Bayes Decision Rule: M-Class Case
Final Step of Derivation
M
yi(x) =
if
Cij p(x | Cj) P(Cj)
j=1
yi(x) < yj(x) for all j = i
Then decide x is from Ci
29
Summary
1. Neyman Pearson Decision Rule
Receiver Operating Characteristic(ROC}
2. M-Class Case MAP Decision Rule
3. M-Class Case MPE Decision Rule
4. M-Class Bayes Decision Rule
30
End of Lecture 9
31