No Slide Title

Download Report

Transcript No Slide Title

Pattern Recognition
Topic 2: Bayes Rule
Expectant mother:
Boy or girl ?
P(boy) = 0.5
P(girl) = 0.5
P(boy or girl) = 1
Dr. Ng Teck Khim
CS4243 Computer Vision and Pattern Recognition
Pattern Recognition Basics
1
Conditional Probability Density
probability
P(wt | girl)
6
Dr. Ng Teck Khim
7
CS4243 Computer Vision and Pattern Recognition
P(wt | boy)
8
9
10
Pattern Recognition Basics
Weight increase
2
Let’s suppose the weight increase of all expectant mothers
depends on whether the baby is boy or girl.
The conditional probability of weight increase for a large
sample of expectant mothers is as shown in the graph on the
next slide.
Suppose now I tell you the weight increase of the expectant
mother. Then you are “better informed” as far as guessing
boy or girl is concerned.
If I tell you weight increase is 9, then you have better chance
of winning the bet if you guess “boy”.
If I tell you weight increase is 7, then you have better chance
of winning the bet if you guess “girl”.
Dr. Ng Teck Khim
CS4243 Computer Vision and Pattern Recognition
Pattern Recognition Basics
3
So the conditional probaility tells you the likelihood of event A
happening given that event B has happened: P( A | B)
Sometimes, you want to compute P(A | B) but you are not given
the conditional probability function P(A | B). Instead, you are
given P(B | A). Can you make use of P(B | A) to compute P(A | B) ?
The answer is “Yes”, with some additional information.
Bayes Rule allows you to do this.
Dr. Ng Teck Khim
CS4243 Computer Vision and Pattern Recognition
Pattern Recognition Basics
4
Bayes Decision Theory
boy
Suppose we have two classes:
girl
c1 and c2
with a priori probability density functions
P (c1 ) and P(c2 ) Respectively.
weight increase
Suppose we have the measurement (observation) x that will give
some clue to allow us to guess whether the measurement comes
from class c1 or c2 .
Depending on whether the class is c1 or c2 , the probability of
observing x may be different. We describe this using the
conditional probability density
P( x | c1 )
Dr. Ng Teck Khim
and
CS4243 Computer Vision and Pattern Recognition
P( x | c2 )
Pattern Recognition Basics
5
Our goal is to find out which class is more likely, given the
measurement x. In other words, we want to compare the
a posteriori probabilities P(c1 | x) and P(c2 | x)
Bayes rule allows us to express the a posteriori probability
in terms of the a priori probability and class conditional density.
Bayes Rule
P( x | ci ) P(ci )
P(ci | x) 
P( x)
Dr. Ng Teck Khim
CS4243 Computer Vision and Pattern Recognition
Pattern Recognition Basics
6
p( wt | b) p(b)
p(b | wt ) 
p( wt )
so
--- Eqn-1
where
p(b)  0.5
p(wt )  p(wt | girl) * p( girl)  p(wt | boy) * p(boy)
2
i.e. P( x)   P( x | ci ) P(ci )
i 1
In this expectant mother problem, we are actually not required to
compute the actual probability p(b | wt). All that we need is to
have a good way to decide whether it is boy or girl.
Since p(wt) is a constant whether it is boy or girl, all that we are
interested in is the numerator of the right hand side of Eqn-1.
Dr. Ng Teck Khim
CS4243 Computer Vision and Pattern Recognition
Pattern Recognition Basics
7
A natural approach:
boy
girl
Decide class
c1 if P(c1 | x)  P(c2 | x)
Decide class
c2 if P(c2 | x)  P(c1 | x)
weight increase
If we have a more complicated situation whereby different
wrong decisions result in different levels of penalty, the decision
process will be slightly more complicated than above.
Example: if baby is girl, and you say “boy”, you lose $10
if baby is boy, and you say “girl”, you lose $20
Then you should attach “cost” in your decision making.
Dr. Ng Teck Khim
CS4243 Computer Vision and Pattern Recognition
Pattern Recognition Basics
8
Let 11 be the penalty for deciding class
c1 if x comes from class c1
Let 12 be the penalty for deciding class
c1 if x comes from class c2
Let 21 be the penalty for deciding class c2 if x comes from class
c1
Let 22 be the penalty for deciding class c2 if x comes from class c2
(Usually, 11 and
22
are both set to zero)
Then given a measurement x, the risk of deciding class
c1 is
R(c1 | x)  11P(c1 | x)  12 P(c2 | x)
Similarly, the risk of deciding class
c2 is
R(c2 | x)  21P(c1 | x)  22 P(c2 | x)
Dr. Ng Teck Khim
CS4243 Computer Vision and Pattern Recognition
Pattern Recognition Basics
9
So we will decide class c1 if
R(c1 | x)  R(c2 | x)
 P(c1 | x) [11  21 ]  P(c2 | x) [22  12 ]
Using Bayes rule,
P(c1 | x) [11  21 ]  P(c2 | x) [22  12 ]
P( x | c1 ) P(c1 )
P( x | c2 ) P(c2 )

[11  21 ] 
[22  12 ]
P( x)
P( x)
 P( x | c1 ) P(c1 ) [11  21 ]  P( x | c2 ) P(c2 ) [22  12 ]
How to obtain these in practice ?
Dr. Ng Teck Khim
CS4243 Computer Vision and Pattern Recognition
Pattern Recognition Basics
10
Gaussian (Normal) Density
Probability density functions (pdf) are often expressed in analytical
form. One such pdf that is very commonly encountered is Gaussian
Density, also known as Normal Density.
Let X be a d-dimensional feature vector.
 x1 
x 
 2
x   x3 
 
 
 xd 
 
Dr. Ng Teck Khim
CS4243 Computer Vision and Pattern Recognition
Pattern Recognition Basics
11
If each of the elements of the vector X is a Gaussian random
variable, the multivariate normal density is given by
1
1
T
1
P ( x) 
exp( (x  x) C (x  x)
2
(2 ) d C
Dr. Ng Teck Khim
CS4243 Computer Vision and Pattern Recognition
Pattern Recognition Basics
12
where
x
is the mean vector, and
C
is the covariance matrix.
x  E{x}
There’s a mistake in
Lab4 instruction
C  E{ (x  x) (x  x)T }
(x(i)  x) 2
(x(i )  x)(y (i)  y )
1 N
C  

2
N i 1 (y (i)  y )(x(i)  x)
(y (i )  y )

C is determinant of C
The linear combination of Gaussian random vectors is also
a Gaussian random vector.
Dr. Ng Teck Khim
CS4243 Computer Vision and Pattern Recognition
Pattern Recognition Basics
13
In Lab4 instruction sheet,
It says
2


 x xy  

  E 
2 

 yx y  

It should have been
2


(x  x)
( x  x )( y  y ) 


  E 

2
( y  y)


( y  y )(x  x )

Dr. Ng Teck Khim
CS4243 Computer Vision and Pattern Recognition
Pattern Recognition Basics
14
Properties of covariance matrix
•
C
is always symmetric and positive semi-definite
•
C
is usually positive definite. It is positive semi-definite when
at least one of the variables have zero variance, or at least 2
of the variables are identical. These are degenerate cases that
we will exclude.
•
C
is a symmetric matrix (Hermitian symmetric if x is complex)
C 0
Dr. Ng Teck Khim
and so
C
CS4243 Computer Vision and Pattern Recognition
1
exists.
Pattern Recognition Basics
15
Maximum Likelihood Estimates
For parameter estimation
Given a set of observations (measurements)
x  {x1 , x2 , x3 ,, xN }
corresponding to time
t  {t1, t2 , t3 ,, t N }
we want to estimate the parameter vector

that satisfies
xi  f ( , t i )
Dr. Ng Teck Khim
CS4243 Computer Vision and Pattern Recognition
Pattern Recognition Basics
16
The maximum likelihood estimate of 
is the ˆ that maximizes P(x |  )
In computing the ML estimates, it is frequently more convenient
and easier to work with the logarithm of P(x |  ) , i.e.
log P(x |  )
It is ok to do this because logarithm is a monotonically increasing
function.
Dr. Ng Teck Khim
CS4243 Computer Vision and Pattern Recognition
Pattern Recognition Basics
17
If {x1 , x2 , x3 ,, xN }
are independent measurements of a random variable x,
then
N
P(x |  )   P( xi |  )
i 1
N
log P(x |  )   log P( xi |  )
i 1
Finding
ˆ
by setting
Dr. Ng Teck Khim
that maximizes log P(x |  ) can usually be done

log P(x |  )  0

CS4243 Computer Vision and Pattern Recognition
Pattern Recognition Basics
18