Computer Vision

Download Report

Transcript Computer Vision

Computer Vision
Lecture 6.
Probabilistic Methods in
Segmentation
This Lecture
•
•
•
•
Probability Theory
Estimation and Decision
Random models
Mixture models for segmentation and the EM
algorithm 16.1, 16.2
• Linear Gaussian models for segmentation
– Templates
– Search
Computer Vision, Lecture 6
Oleh Tretiak © 2005
Slide 1
Probability Theory
• A continuous random variable is a model for a measurement,
such as the air temperature in Lviv, that takes on a range of
possible values and that cannot be predicted exactly. The
density function p(t) conveys information about the values of
temperature that occur more or less often.
• The density function tells us how often the temperature takes on
certain values. If, for example,
15
 p(t)dt  0.50

then, if we take many measurements, the fraction of
measurements that are less than 15° will be 1/2.

Computer Vision, Lecture 6
Oleh Tretiak © 2005
Slide 2
More Probability
• A set of independent identically distributed (iid)
random variables with density function p(t) is a model
for a collection of measurements (t1, ... tn) such that
the value of any one of the variables cannot predict
any of the others. The joint density function is given
by
p(t1,t2 , ... tn )  p(t1 )p(t2 )...p(t n )
• We often use the IID model even when we have good
reason to
 believe that the random variables are not
independent.
Computer Vision, Lecture 6
Oleh Tretiak © 2005
Slide 3
Examples of Random Images
• The images below were produced by models based on
probability theory
IID, µ = 128, σ = 25
Not independent, µ = 128, σ = 25
Different µ, same σ
Same µ, different σ
Computer Vision, Lecture 6
Oleh Tretiak © 2005
Slide 4
Mixture Distribution
• Suppose we have a sequence of random variables (y1 ... yn).
Each variable is a vector consisting of two components, y = (x,
) where x is continuously distributed and  takes on one of two
values: = 1 or = 2. The variable x has density function p1(x) if
= 1 and p2(x) if = 2. Variables i are iid with P(= 1) = 1 and
P(= 1) = 1 and
P(= 2) = 1.
• The density function of x is given by
p(x)distribution,
 p1(x)1  p2and
(x) 2the model that produces
This is called a mixture
the sequence xi is called a mixture model. If only x is observed
then  is a hidden random variable.

Computer Vision, Lecture 6
Oleh Tretiak © 2005
Slide 5
Decision Problem
• It is given that a sequence xi comes from a mixture model and
that  is unknown. Our goal is to find the values of  from the
values of x. This is called a decision problem. If the two density
functions are as shown below, than it is reasonable to choose
some threshold value t and to choose  = 1 if x < t, and to
choose  = 2 if x ≥ t.
• We see that the segmentation problem is closely related to
mixtures and to decision theory.
Components of a mixture
1.8
T hreshol t
1.6
p(x)
1.4
1.2
p1
1
p2
0.8
0.6
0.4
0.2
0
0
0.5
1
1.5
2
x
Computer Vision, Lecture 6
Oleh Tretiak © 2005
Slide 6
Estimation Problem
•
Often, we do not know the exact form of the density function. We may
know a formula, but not know some of the parameters. For example,
we may know that the density function is given by
(x   )2 
1
p(x; , ) 
exp

2
2
 2

We are also given a set of iid observations (x1, ... , xn), and we need to
find the values of  It’s not possible find the exact values, but a good
estimate can
 be found from the principle of maximum likelihood. The
likelihood function is defined as
L(x1,x2 , ... ,xn ; , )  p(x1; , ) ... p(xn ; , )
•
As our estimate we chose the values produce the largest value of
the likelihood function with the given values of (x1, ... , xn).

Computer Vision, Lecture 6
Oleh Tretiak © 2005
Slide 7
Estimation for Gaussian Densities
• For a Gaussian random variable the
maximum likelihood estimates of and  can
be found in closed form. They are:
1 n

ˆ   xi
n i1
1 n
ˆ

(xi  
ˆ )2

n i1

Computer Vision, Lecture 6
Oleh Tretiak © 2005
Slide 8
Estimation for Mixtures
• There is no closed-form solution for estimating mixture
parameters. The following iterative procedure is often used. We
describe the procedure for estimating the following density
function:
p(x)  1 p1(x;1,1 )  2 p2 (x;2 , 2 )
• We start with initial estimates of  We
find thenext estimates (indicated by +) as follows:
Ilm 
 m

1 n
  Ilm
n l1
Computer Vision, Lecture 6
 m p(xl ; m , m )
, m  1,2
1 p(xl ;1,1 )  2 p(xl ; 2 , 2 )
 m

1
n
xI
  l lm
n m l1
Oleh Tretiak © 2005
 m

1
n
2
(x


)
Ilm

l
m

n m l1
Slide 9
Expectation Maximization
• The above is an example of the ExpectationMaximization algorithm. It is used for many
difficult parameter estimation problems. The
algorithm alternates between computing Ilm,
the expected values of the hidden variables,
and finding the next values of the parameters

• The k - means algorithm can be viewed as
something analogous to the E-M algorithm.
Computer Vision, Lecture 6
Oleh Tretiak © 2005
Slide 10
Template Matching
• A multivariate Gaussian model can be used when the
components of a sequence or an image are not iid. A
general formula for the density function is given on
page 493. We will consider a simpler case. We
assume that the observations are given by vector x.
The components of the vector are Gaussian and
have identical standard deviations, but the means are
not identical. We represent the vector of mean values
by m. The density function is given by
(x  m)T (x  m)
p(x;m) 
exp

2 d /2
2
(2 )
2


1
Computer Vision, Lecture 6
Oleh Tretiak © 2005

x  [x1, ... xd ]T , m  [1, ...  d ]T
Slide 11
Choice Among Templates
• Suppose we observe a vector x which is one of d possible
objects each descibed by the above model but having different
means m1, ... md. To identify the object we find the one that has
the highest p(x; mi). This is equivalent to finding the value of i for
which ||x – mi||2 is as small as possible.
• We use algebra to show2 thatT
T
T
x  mi
 x x  2x mi  mi mi
To find the most likely i we need to find the largest
2xT m i  mTi m i
 the same value for all i then the highest probability
• If miTmi has
occurs when xTmi is as large as possible. This is the principle of
maximum correlation.

Computer Vision, Lecture 6
Oleh Tretiak © 2005
Slide 12