PPT - UNC Computer Science

Download Report

Transcript PPT - UNC Computer Science

Yilin Wang
11/5/2009
Background
 Labeling Problem
 Labeling: Observed data set (X)
Label set (L)
 Inferring the labels of the data points
 Most vision problems can be posed as labeling
problems
 Stereo matching
 Image segmentation
 Image restoration
Examples of Labeling Problem
 Stereo Matching
 For a pixel in image 1, where is the corresponding pixel
in image 2?
Label set: Differences (disparities)
between corresponding pixels
Picture source: S. Lazebnik
Examples of Labeling Problem
 Image Segmentation
 To partition an image
into multiple disjoint
regions.
Label set: Region IDs
Picture source: http://mmlab.ie.cuhk.edu.hk
Examples of Labeling Problem
 Image Restoration
 To "compensate for"
or "undo" defects
which degrade an
image.
Label set: Restored Intensities
Picture source: http://www.photorestoration.co.nz
Background
 Image Labeling
 Given an image, the system should automatically
partition it into semantically meaningful areas each
labeled with a specific object class
Cow
Sky
Lawn
Tree
Building
Plane
Image Labeling Problem
 Given
 X  {xi }iS : the observed data from an input image ,
th
x
i
where i is the data from site (pixel or block) of the
image set S
 A pre-defined label set
 Let L  {li }iS be the corresponding labels at the image
sites, we want to find proper L to maximize the
conditional probability P ( L | X ) :

L  arg max L P( L | X )
Which kinds of information can be used for labeling?
•Features from individual sites
Intensity, color, texture, …
•Interactions with neighboring sites
Contextual information
Sky or Building?
Vegetation
Two types of interactions
•Interaction with neighboring labels (Spatial smoothness of labels)
•neighboring sites tend to have similar labels(except at the discontinuities)
Sky
•Interactions with neighboring observed data
Sky
Building
Information for Image Labeling
 Let li be the label of the i th site of the image set S, and
Ni be the neighboring sites of site i
site i
S-{i}
Ni
 Three kinds of information for image labeling
Info(li )
 Features from local site
 Interaction with neighboring labels Info(li , l N )
 Interaction with neighboring observed data Info(li , x N )
i
i
Picture source: S. Xiang
Markov Random Fields (MRF)
 Markov Random Fields (MRFs) are the most popular
models to incorporate local contextual constraints in
labeling problems
th
l
i
i
 Let be the label of the site of the image set S, and
Ni be the neighboring sites of site i
The label set L (  {li }iS) is said to be a MRF on S w.r.t. a
neighborhood N iff the following condition is satisfied:
 Markov property:
P(li | lS {i} )  P(li | l N i )
Maintain global spatial consistency by only
considering relatively local dependencies !
Markov-Gibbs Equivalence
 Let l be a realization of L( {li }iS ) , then P(l) has an
explicit formulation (Gibbs distribution):
where
1
1
P(l )  exp(  E (l ))
Z
T
Z   exp( 
lL
Energy function
1
E (l ))
T
E (l )  Vc (l ) 
cC
Z: a normalizing factor, called the
partition function
T: a constant
Clique Ck={{i,i’,i’’,…}|i,i’,i’’,… are
neighbors to one another}
V (l )  V (l , l )  ...
{i}C1
1
i
{i ,i '}C2
2
i
i'
Potential functions represent a priori knowledge of interactions
between labels of neighboring sites
Auto-Model
 With clique potentials up to two sites, the energy
takes the form
E ( L)  V1 (li )   V2 (li , li ' )
iS
iS i 'Ni
 When V1 (li )  li Gi (li ) andV2 (li , li ' )   i ,i 'li li ' , where Gi(·) are
arbitrary functions (or constants) and  i ,i ' are constants
reflecting the pairwise interaction between i and i’, the
energy is
E ( L)   li Gi (li )    i ,i 'li li '
iS
Info(li )
iS i 'Ni
Info(li , l N i )
Such models are called auto-models (Besag 1974)
Parameter Estimation
 Give the functional form of the auto-model
E ( L)   li     li li '
iS
iS i 'N i
How to specify its parameters  ( { ,  }) ?
Maximum Likelihood Estimation
 Given a realization l of a MRF, the maximum
likelihood (ML) estimate maximizes the conditional
probability P(l | θ) (the likelihood of θ), that is:
   arg max  P(l |  )
 By Bayesian rules:
P( | l )  P(l |  ) P( )
 The prior P(θ) is assumed to be flat when the prior
information is totally unavailable. In this case, the
MAP estimation reduces to the ML estimation.
Maximum Likelihood Estimation
 The likelihood function is in the Gibbs form
1
p(l |  ) 
 exp(  E (l |  ))
Z ( )
where
Z ( )   exp(  E (l |  ))
lL
E (l |  )   li     li li '
iS
iS i 'N i
 However, the computation of Z(θ) is intractable even
for moderately sized problems because there are a
combinatorial number of elements in the
configuration space L.
Maximum Pseudo-Likelihood
 Assumption: li and l N are independent.
i
P(l |  )   P(li | l Ni , ) 
iS
iS
exp( li    li li ' )
i 'N i
1  exp(    li ' )
i 'N i
 Notice that the pseudo-likelihood does not involve the
partition function Z.
 {α, β} can be obtained by solving
 ln P(l |  ,  )
0

 ln P(l |  ,  )
0

Inference
 Recall that in image labeling, we want to find L such
that maximizes the posterior P ( L | X ) , by Bayesian
rules:
P( L | X )  P( L, X )  P( X | L) P( L)
Where prior probability: P( L)  Z 1 exp(  E ( L))
 Let P( X | L)  Z11 exp(  E ( X | L)) and
P( L | X )  exp(  E ( L | X ))  P( L, X )
then:
E ( L | X )  E ( X | L)  E ( L)
posterior energy
prior energy
likelihood energy
MAP-MRF Labeling
 Maximizing a posterior probability is equivalent to
minimizing the posterior energy:
L*  arg min L E ( L | X )
 Steps of MAP:
N
C
E ( L)   li     li li '
iS
iS i 'N i
E ( X | L)
E ( L | X )  E ( X | L)  E ( L)
Picture source: S. Xiang
MRF for Image Labeling
 Difficulties and disadvantages
 Very strict independence assumptions :
P(l |  )   P(li | l Ni , )
iS
 The interactions among label are modeled by the priori
term (P(L)), and are independent of the observation
data, which prohibits one from modeling datadependent interactions in labels.
Conditional Random Fields
 Let G = (S, E) be a graph, then (X, L) is said to be a
Conditional Random Field (CRF) if, when conditioned
on X, the random variables li obey the Markov property
with respect to the graph:
Compare with MRF:
P(li | X , lS {i} )  P(li | X , l Ni )
P(li | lS {i} )  P(li | l N i )
where S-{i} is the set of all sites in the graph except the
site i, Ni is the set of neighbors of the site i in G.
CRF
 According to the Markov-Gibbs equivalence, we have
1
1
P( L | X )  exp(  E ( L | X ))
Z
T
 If only up to pairwise clique potentials are nonzero, the
posterior probability P(L| X) has the form
1
P( L | X )  exp{V1 (li | X )  V2 (li , li ' | X )}
Z
iS
iS i 'N i
where −V1 and −V2 are called the association and
interaction potentials, respectively, in the CRF
literature
CRF vs. MRF
 MRF is a generative model(Two-step)
 Infer likelihood P ( X | L ) and prior P(L)
 Then use Bayes theorem to determine posterior P ( L | X )
 CRF is a discriminative model(One-step)
 Directly infer posterior P ( L | X )
CRF vs. MRF
 More differences between the CRF and MRF
 MRF: P( L) 
1
exp( V1 (li )   V2 (li , li ' ))
Z
iS
iS i 'N i
Info(li )
Info(li , l N i )
1
 CRF: P( L | X )  exp{V1 (li | X )  V2 (li , li ' | X )}
Z
iS
iS i 'N i
Info(li ) Info(li , xNi ) Info(li , l N i ) Info(li , xNi )
 In CRF, both Association and Interaction
Potentials are functions of all the observation data
as well as that of the labels
Discriminative Random Fields
 The Discriminative Random Field (DRF) is a special
type of CRF with two extensions.
 First, a DRF is defined over 2D lattices (such as the
image grid)
 Second, the unary (association) and pairwise
(interaction) potentials therein are designed using local
discriminative classifiers
Kumar, S. and M. Hebert: `Discriminative Random Fields: A Discriminative
Framework for Contextual Interaction in Classification'. ICCV 2003
DRF
 Formulation of DRF
1
P( L | X )  exp{ Ai (li , X )   I ii' (li , li ' , X )}
Z
iS
iS i 'N i
where Ai andI ii' are called association potential and
interaction potential
Picture source: S. Xiang
Association Potential
 A(li , X ) is modeled using a local discriminative
model that outputs the association of the site i
with class li as:
A(li , X )  log P' (li | f i ( X ))
where fi(.) is a linear function that maps an patch
centered at site i to a feature vector.
Picture source: S. Srihari
Association Potential
 For binary classification (li = -1 or 1), the
posterior at site i is modeled using a
logistic function:
1
T
P' (li  1 | f i ( X )) 


(
w
f i ( X ))
T
1  exp( ( w f i ( X )))
 Since li = -1 or 1, the probability can be
compactly expressed as:
P(li | X )   (li wT f i ( X ))
 Finally, the association potential is
defined as:
A(li , X )  log(  (li wT fi ( X )))
Picture source: S. Srihari
Interaction Potential
 The interaction potential can be seen as a measure of
how the labels at neighboring sites i and i' should
interact given the observed image X.
 Given the features at two different sites, a pairwise
discriminative model is defined as:
P' ' (tii' |  i ( X ), i ' ( X ))   (tii'v ii' ( X ))
T
 1
tii'  
 1
if li  li '
otherwise
where  k (X ) is a function that maps an patch centered
at site i to a feature vector, ii' ( i ( X ), i ' ( X )) is a new
feature vector, and v are model parameters
P' ' (tii' | i ( X ), i ' ( X )) is a measure of how likely site i and i’
have the same label given image X
Interaction Potential
 The interaction potential is modeled using data-dependent term
along with constant smoothing term
I (li , li ' , X )   {Kli li '  (1  K )( 2 (tii'vT ii' ( X ))  1)}
 The first term is a data-independent smoothing term,
similar to the auto-model
 The second term is a [-1, 1] mapping of the pairwise logistic
function , which ensures that both terms have the same range
 Ideally, the data-dependent term will act as a discontinuity
adaptive model that will moderate smoothing when the data
from two sites is 'different'.
Discussion of I(li,li’,X)
I (li , li ' , X )   {Kli li '  (1  K )( 2 (tii'vT ii' ( X ))  1)}
Suppose li '  a , i' N i , and li {a, b}
Also for simplicity, we assume A(li  a)  A(li  b)
Then
I (a, a, X )  I (a, b, X )
a
li  
I (a, a, X )  I (a, b, X )
b
 If only considering Kli li ' , li will never choose b. Oversmoothed!
 The second term is used to compensate the effect of the
smoothness assumption.
Parameter
Estimation
θ={w,v,β,K}
 Maximum likelihood estimation
  arg max  P(l | X , )
 In the conventional maximum-likelihood approach, the
evaluation of Z is an NP-hard problem.
 Approximate evaluation of partition function Z by
pseudo-likelihood M

  arg max   P(lim | l Nm , X m , )
Subject to 0≤K≤1
i
m 1 iS
where m indexes over the training images and M is the
total number of training images, and
1
P(li | l N , X , )  exp{ A(li , X )   I (li , li ' , X )}
zi
i 'N
zi   exp{ A(li , X )   I (li , li ' , X )}
i
i
li {1,1}
i 'Ni
Inference
 Objective function:
l  arg max l P(l | X )
 Iterated Conditional Modes (ICM) algorithm
 Given an initial label configuration, ICM maximizes the
local conditional probabilities iteratively, i.e.
li  arg max li P(li | l N i , X )
 ICM yields local maximum of the posterior and has been
shown to give reasonably good results
Experiment
 Task: detecting man-made structures in natural scenes
 Database
 Corel (training: 108 images, test: 129 images)
 Each image was divided in non-overlapping 16*16 pixels
blocks
 Compared methods
 Logistic
 MRF
 DRF
Experiment Results
 Detection Rates (DR) and False Positives (FP)
The DRF reduces
false positives from
the MRF by more
than 48%.
Superscript ‘-’ indicates no neighborhood data interaction was used. K = 0
indicates the absence of the data independent term in the interaction
potential in DRF.
Experiment Results
For similar
detection rate,
DRF has the
lower false
positives
Detection rate of
DRF is higher than
that of MRF for
similar false
positives
Conclusion of DRF
 Pros
 Provide the benefits of discriminative models
 Demonstrate good performance
 Cons
 Although the model outperforms traditional MRFs, it is
not strong enough to capture long range correlations
among the labels due to the rigid lattice based structure
which allows for only pairwise interactions
Problem
 Local information can be confused when there are
large overlaps between different classes
Sky or Water ?
Solution: utilizing the global contextual
information to improve the performance
Multiscale Conditional Random
Field (mCRF)
 Considering features in different scales
 Local Features (site)
 Regional Label Features (small patch)
 Global Label Features (big patch or the whole image)
 The conditional probability P(L|X) is formulated by
features in different scales s:
where
1
P( L | X )   Ps ( L | X )
Z s
Z   Ps ( L | X )
L
s
He, X., R. Zemel, and M. Carreira-Perpinan: 2004, `Multiscale conditional random
fields for image labelling'. IEEE Int. Conf. CVPR.
Local Features
 The local feature of site i is represented by the
outputs of several filters.
 The aim is to associate the patch with one of a
predefined set of labels.
Local Classifier
 Here a multilayer perceptron is used as the local
classifier.
 Independently at each site i, the local classifier
produces a conditional distribution over label variable
li given filter outputs xi within an image patch centered
on site (pixel) i:
where λ are the classifier parameters.
Regional Label Features
 Encoding a particular constraint between the
image and the labels within a region of the image
 Sample pattern: ground pixels (brown) above
water pixels (cyan)
Global Label Features
 Operate at a coarser resolution, specifying common
value for a patch of sites in the label field.
 Sample pattern: sky pixels (blue) at the top of the
image, hippo pixels (red) in the middle, and water
pixels (cyan) near the bottom.
Feature Function
 Global label features are trained by Restricted
Boltzmann Machines (RBM)
 two layers: label sites (L) and features (f)
 features and labels are fully inter-connected, with no
intra-layer connections
The joint distribution of the
global label feature model is:
f1
f2
fm
PG ( L, f )  exp{ f a waT L}
a
where wa is the parameter
connecting hidden global label
feature fa and label sites L
l1
l2
ln
Feature Function
 By marginalizing out the hidden variables (f), the
global component of the model is:
PG ( L)   (1  exp( waT L))
a
 Similarly, the regional component of the model can be
represented as:
PR ( L)   (1  exp( ubT lr ))
r ,b
 By multiplicatively combining component
conditional distributions:
1
P( L | X ,  )   PC (li | xi ,  )  (1  exp( ubT lr ))   (1  exp( waT L))
Z i
r ,b
a
  { ,{ub },{wa }}
Parameter Estimation and Inference
 Parameter Estimation
 The conditional model is trained discriminatively based
on the Conditional Maximum Likelihood (CML)
criterion, which maximizes the log conditional
likelihood:
 *  arg max   log P( Lt | X t ; )
t
 Inference
 Maximum Posterior Marginals (MPM):
li  arg max li P(li | X ),
*
i  S
Experiment Results
 Database
 Corel (100 images with 7 labels)
 Sowerby (104 images with 8 labels)
 Compared methods
 Single classifier (MLP)
 MRF
 mCRF
Labeling Results
Conclusion of mCRF
 Pros
 Formulating the image labeling problem into a
multiscale CRF model
 Combining the local and larger scale contextual
information in a unique framework
 Cons
 Including additional classifiers operating at different
scales into the mCRF framework introduces a large
number of model parameters
 The model assumes conditional independence of
hidden variables given the label field
More CRF models
 Hierarchical Conditional Random Field (HCRF)
 –S. Kumar and M. Hebert. A hierarchical field
framework for unified context-based classification. 2005
 –Jordan Reynolds and Kevin Murphy. Figure-ground
segmentation using a hierarchical conditional random
field. 2007
 Tree Structured Conditional Random Fields (TCRF)
 –P. Awasthi, A. Gagrani, and B. Ravindran, Image
Modeling using Tree Structured Conditional Random
Fields. 2007
Reference
 Li, S. Z.: 2009, `Markov Random Field Modeling in
Image Analysis’. Springer, 2009
 Kumar, S. and M. Hebert: 2003, `Discriminative
Random Fields: A Discriminative Framework for
Contextual Interaction in Classification'. in proc. IEEE
International Conference on Computer Vision (ICCV)
 He, X., R. Zemel, and M. Carreira-Perpinan: 2004,
`Multiscale conditional random fields for image
labelling'. IEEE Int. Conf. CVPR.
End
Thanks!