PPT - UNC Computer Science
Download
Report
Transcript PPT - UNC Computer Science
Yilin Wang
11/5/2009
Background
Labeling Problem
Labeling: Observed data set (X)
Label set (L)
Inferring the labels of the data points
Most vision problems can be posed as labeling
problems
Stereo matching
Image segmentation
Image restoration
Examples of Labeling Problem
Stereo Matching
For a pixel in image 1, where is the corresponding pixel
in image 2?
Label set: Differences (disparities)
between corresponding pixels
Picture source: S. Lazebnik
Examples of Labeling Problem
Image Segmentation
To partition an image
into multiple disjoint
regions.
Label set: Region IDs
Picture source: http://mmlab.ie.cuhk.edu.hk
Examples of Labeling Problem
Image Restoration
To "compensate for"
or "undo" defects
which degrade an
image.
Label set: Restored Intensities
Picture source: http://www.photorestoration.co.nz
Background
Image Labeling
Given an image, the system should automatically
partition it into semantically meaningful areas each
labeled with a specific object class
Cow
Sky
Lawn
Tree
Building
Plane
Image Labeling Problem
Given
X {xi }iS : the observed data from an input image ,
th
x
i
where i is the data from site (pixel or block) of the
image set S
A pre-defined label set
Let L {li }iS be the corresponding labels at the image
sites, we want to find proper L to maximize the
conditional probability P ( L | X ) :
L arg max L P( L | X )
Which kinds of information can be used for labeling?
•Features from individual sites
Intensity, color, texture, …
•Interactions with neighboring sites
Contextual information
Sky or Building?
Vegetation
Two types of interactions
•Interaction with neighboring labels (Spatial smoothness of labels)
•neighboring sites tend to have similar labels(except at the discontinuities)
Sky
•Interactions with neighboring observed data
Sky
Building
Information for Image Labeling
Let li be the label of the i th site of the image set S, and
Ni be the neighboring sites of site i
site i
S-{i}
Ni
Three kinds of information for image labeling
Info(li )
Features from local site
Interaction with neighboring labels Info(li , l N )
Interaction with neighboring observed data Info(li , x N )
i
i
Picture source: S. Xiang
Markov Random Fields (MRF)
Markov Random Fields (MRFs) are the most popular
models to incorporate local contextual constraints in
labeling problems
th
l
i
i
Let be the label of the site of the image set S, and
Ni be the neighboring sites of site i
The label set L ( {li }iS) is said to be a MRF on S w.r.t. a
neighborhood N iff the following condition is satisfied:
Markov property:
P(li | lS {i} ) P(li | l N i )
Maintain global spatial consistency by only
considering relatively local dependencies !
Markov-Gibbs Equivalence
Let l be a realization of L( {li }iS ) , then P(l) has an
explicit formulation (Gibbs distribution):
where
1
1
P(l ) exp( E (l ))
Z
T
Z exp(
lL
Energy function
1
E (l ))
T
E (l ) Vc (l )
cC
Z: a normalizing factor, called the
partition function
T: a constant
Clique Ck={{i,i’,i’’,…}|i,i’,i’’,… are
neighbors to one another}
V (l ) V (l , l ) ...
{i}C1
1
i
{i ,i '}C2
2
i
i'
Potential functions represent a priori knowledge of interactions
between labels of neighboring sites
Auto-Model
With clique potentials up to two sites, the energy
takes the form
E ( L) V1 (li ) V2 (li , li ' )
iS
iS i 'Ni
When V1 (li ) li Gi (li ) andV2 (li , li ' ) i ,i 'li li ' , where Gi(·) are
arbitrary functions (or constants) and i ,i ' are constants
reflecting the pairwise interaction between i and i’, the
energy is
E ( L) li Gi (li ) i ,i 'li li '
iS
Info(li )
iS i 'Ni
Info(li , l N i )
Such models are called auto-models (Besag 1974)
Parameter Estimation
Give the functional form of the auto-model
E ( L) li li li '
iS
iS i 'N i
How to specify its parameters ( { , }) ?
Maximum Likelihood Estimation
Given a realization l of a MRF, the maximum
likelihood (ML) estimate maximizes the conditional
probability P(l | θ) (the likelihood of θ), that is:
arg max P(l | )
By Bayesian rules:
P( | l ) P(l | ) P( )
The prior P(θ) is assumed to be flat when the prior
information is totally unavailable. In this case, the
MAP estimation reduces to the ML estimation.
Maximum Likelihood Estimation
The likelihood function is in the Gibbs form
1
p(l | )
exp( E (l | ))
Z ( )
where
Z ( ) exp( E (l | ))
lL
E (l | ) li li li '
iS
iS i 'N i
However, the computation of Z(θ) is intractable even
for moderately sized problems because there are a
combinatorial number of elements in the
configuration space L.
Maximum Pseudo-Likelihood
Assumption: li and l N are independent.
i
P(l | ) P(li | l Ni , )
iS
iS
exp( li li li ' )
i 'N i
1 exp( li ' )
i 'N i
Notice that the pseudo-likelihood does not involve the
partition function Z.
{α, β} can be obtained by solving
ln P(l | , )
0
ln P(l | , )
0
Inference
Recall that in image labeling, we want to find L such
that maximizes the posterior P ( L | X ) , by Bayesian
rules:
P( L | X ) P( L, X ) P( X | L) P( L)
Where prior probability: P( L) Z 1 exp( E ( L))
Let P( X | L) Z11 exp( E ( X | L)) and
P( L | X ) exp( E ( L | X )) P( L, X )
then:
E ( L | X ) E ( X | L) E ( L)
posterior energy
prior energy
likelihood energy
MAP-MRF Labeling
Maximizing a posterior probability is equivalent to
minimizing the posterior energy:
L* arg min L E ( L | X )
Steps of MAP:
N
C
E ( L) li li li '
iS
iS i 'N i
E ( X | L)
E ( L | X ) E ( X | L) E ( L)
Picture source: S. Xiang
MRF for Image Labeling
Difficulties and disadvantages
Very strict independence assumptions :
P(l | ) P(li | l Ni , )
iS
The interactions among label are modeled by the priori
term (P(L)), and are independent of the observation
data, which prohibits one from modeling datadependent interactions in labels.
Conditional Random Fields
Let G = (S, E) be a graph, then (X, L) is said to be a
Conditional Random Field (CRF) if, when conditioned
on X, the random variables li obey the Markov property
with respect to the graph:
Compare with MRF:
P(li | X , lS {i} ) P(li | X , l Ni )
P(li | lS {i} ) P(li | l N i )
where S-{i} is the set of all sites in the graph except the
site i, Ni is the set of neighbors of the site i in G.
CRF
According to the Markov-Gibbs equivalence, we have
1
1
P( L | X ) exp( E ( L | X ))
Z
T
If only up to pairwise clique potentials are nonzero, the
posterior probability P(L| X) has the form
1
P( L | X ) exp{V1 (li | X ) V2 (li , li ' | X )}
Z
iS
iS i 'N i
where −V1 and −V2 are called the association and
interaction potentials, respectively, in the CRF
literature
CRF vs. MRF
MRF is a generative model(Two-step)
Infer likelihood P ( X | L ) and prior P(L)
Then use Bayes theorem to determine posterior P ( L | X )
CRF is a discriminative model(One-step)
Directly infer posterior P ( L | X )
CRF vs. MRF
More differences between the CRF and MRF
MRF: P( L)
1
exp( V1 (li ) V2 (li , li ' ))
Z
iS
iS i 'N i
Info(li )
Info(li , l N i )
1
CRF: P( L | X ) exp{V1 (li | X ) V2 (li , li ' | X )}
Z
iS
iS i 'N i
Info(li ) Info(li , xNi ) Info(li , l N i ) Info(li , xNi )
In CRF, both Association and Interaction
Potentials are functions of all the observation data
as well as that of the labels
Discriminative Random Fields
The Discriminative Random Field (DRF) is a special
type of CRF with two extensions.
First, a DRF is defined over 2D lattices (such as the
image grid)
Second, the unary (association) and pairwise
(interaction) potentials therein are designed using local
discriminative classifiers
Kumar, S. and M. Hebert: `Discriminative Random Fields: A Discriminative
Framework for Contextual Interaction in Classification'. ICCV 2003
DRF
Formulation of DRF
1
P( L | X ) exp{ Ai (li , X ) I ii' (li , li ' , X )}
Z
iS
iS i 'N i
where Ai andI ii' are called association potential and
interaction potential
Picture source: S. Xiang
Association Potential
A(li , X ) is modeled using a local discriminative
model that outputs the association of the site i
with class li as:
A(li , X ) log P' (li | f i ( X ))
where fi(.) is a linear function that maps an patch
centered at site i to a feature vector.
Picture source: S. Srihari
Association Potential
For binary classification (li = -1 or 1), the
posterior at site i is modeled using a
logistic function:
1
T
P' (li 1 | f i ( X ))
(
w
f i ( X ))
T
1 exp( ( w f i ( X )))
Since li = -1 or 1, the probability can be
compactly expressed as:
P(li | X ) (li wT f i ( X ))
Finally, the association potential is
defined as:
A(li , X ) log( (li wT fi ( X )))
Picture source: S. Srihari
Interaction Potential
The interaction potential can be seen as a measure of
how the labels at neighboring sites i and i' should
interact given the observed image X.
Given the features at two different sites, a pairwise
discriminative model is defined as:
P' ' (tii' | i ( X ), i ' ( X )) (tii'v ii' ( X ))
T
1
tii'
1
if li li '
otherwise
where k (X ) is a function that maps an patch centered
at site i to a feature vector, ii' ( i ( X ), i ' ( X )) is a new
feature vector, and v are model parameters
P' ' (tii' | i ( X ), i ' ( X )) is a measure of how likely site i and i’
have the same label given image X
Interaction Potential
The interaction potential is modeled using data-dependent term
along with constant smoothing term
I (li , li ' , X ) {Kli li ' (1 K )( 2 (tii'vT ii' ( X )) 1)}
The first term is a data-independent smoothing term,
similar to the auto-model
The second term is a [-1, 1] mapping of the pairwise logistic
function , which ensures that both terms have the same range
Ideally, the data-dependent term will act as a discontinuity
adaptive model that will moderate smoothing when the data
from two sites is 'different'.
Discussion of I(li,li’,X)
I (li , li ' , X ) {Kli li ' (1 K )( 2 (tii'vT ii' ( X )) 1)}
Suppose li ' a , i' N i , and li {a, b}
Also for simplicity, we assume A(li a) A(li b)
Then
I (a, a, X ) I (a, b, X )
a
li
I (a, a, X ) I (a, b, X )
b
If only considering Kli li ' , li will never choose b. Oversmoothed!
The second term is used to compensate the effect of the
smoothness assumption.
Parameter
Estimation
θ={w,v,β,K}
Maximum likelihood estimation
arg max P(l | X , )
In the conventional maximum-likelihood approach, the
evaluation of Z is an NP-hard problem.
Approximate evaluation of partition function Z by
pseudo-likelihood M
arg max P(lim | l Nm , X m , )
Subject to 0≤K≤1
i
m 1 iS
where m indexes over the training images and M is the
total number of training images, and
1
P(li | l N , X , ) exp{ A(li , X ) I (li , li ' , X )}
zi
i 'N
zi exp{ A(li , X ) I (li , li ' , X )}
i
i
li {1,1}
i 'Ni
Inference
Objective function:
l arg max l P(l | X )
Iterated Conditional Modes (ICM) algorithm
Given an initial label configuration, ICM maximizes the
local conditional probabilities iteratively, i.e.
li arg max li P(li | l N i , X )
ICM yields local maximum of the posterior and has been
shown to give reasonably good results
Experiment
Task: detecting man-made structures in natural scenes
Database
Corel (training: 108 images, test: 129 images)
Each image was divided in non-overlapping 16*16 pixels
blocks
Compared methods
Logistic
MRF
DRF
Experiment Results
Detection Rates (DR) and False Positives (FP)
The DRF reduces
false positives from
the MRF by more
than 48%.
Superscript ‘-’ indicates no neighborhood data interaction was used. K = 0
indicates the absence of the data independent term in the interaction
potential in DRF.
Experiment Results
For similar
detection rate,
DRF has the
lower false
positives
Detection rate of
DRF is higher than
that of MRF for
similar false
positives
Conclusion of DRF
Pros
Provide the benefits of discriminative models
Demonstrate good performance
Cons
Although the model outperforms traditional MRFs, it is
not strong enough to capture long range correlations
among the labels due to the rigid lattice based structure
which allows for only pairwise interactions
Problem
Local information can be confused when there are
large overlaps between different classes
Sky or Water ?
Solution: utilizing the global contextual
information to improve the performance
Multiscale Conditional Random
Field (mCRF)
Considering features in different scales
Local Features (site)
Regional Label Features (small patch)
Global Label Features (big patch or the whole image)
The conditional probability P(L|X) is formulated by
features in different scales s:
where
1
P( L | X ) Ps ( L | X )
Z s
Z Ps ( L | X )
L
s
He, X., R. Zemel, and M. Carreira-Perpinan: 2004, `Multiscale conditional random
fields for image labelling'. IEEE Int. Conf. CVPR.
Local Features
The local feature of site i is represented by the
outputs of several filters.
The aim is to associate the patch with one of a
predefined set of labels.
Local Classifier
Here a multilayer perceptron is used as the local
classifier.
Independently at each site i, the local classifier
produces a conditional distribution over label variable
li given filter outputs xi within an image patch centered
on site (pixel) i:
where λ are the classifier parameters.
Regional Label Features
Encoding a particular constraint between the
image and the labels within a region of the image
Sample pattern: ground pixels (brown) above
water pixels (cyan)
Global Label Features
Operate at a coarser resolution, specifying common
value for a patch of sites in the label field.
Sample pattern: sky pixels (blue) at the top of the
image, hippo pixels (red) in the middle, and water
pixels (cyan) near the bottom.
Feature Function
Global label features are trained by Restricted
Boltzmann Machines (RBM)
two layers: label sites (L) and features (f)
features and labels are fully inter-connected, with no
intra-layer connections
The joint distribution of the
global label feature model is:
f1
f2
fm
PG ( L, f ) exp{ f a waT L}
a
where wa is the parameter
connecting hidden global label
feature fa and label sites L
l1
l2
ln
Feature Function
By marginalizing out the hidden variables (f), the
global component of the model is:
PG ( L) (1 exp( waT L))
a
Similarly, the regional component of the model can be
represented as:
PR ( L) (1 exp( ubT lr ))
r ,b
By multiplicatively combining component
conditional distributions:
1
P( L | X , ) PC (li | xi , ) (1 exp( ubT lr )) (1 exp( waT L))
Z i
r ,b
a
{ ,{ub },{wa }}
Parameter Estimation and Inference
Parameter Estimation
The conditional model is trained discriminatively based
on the Conditional Maximum Likelihood (CML)
criterion, which maximizes the log conditional
likelihood:
* arg max log P( Lt | X t ; )
t
Inference
Maximum Posterior Marginals (MPM):
li arg max li P(li | X ),
*
i S
Experiment Results
Database
Corel (100 images with 7 labels)
Sowerby (104 images with 8 labels)
Compared methods
Single classifier (MLP)
MRF
mCRF
Labeling Results
Conclusion of mCRF
Pros
Formulating the image labeling problem into a
multiscale CRF model
Combining the local and larger scale contextual
information in a unique framework
Cons
Including additional classifiers operating at different
scales into the mCRF framework introduces a large
number of model parameters
The model assumes conditional independence of
hidden variables given the label field
More CRF models
Hierarchical Conditional Random Field (HCRF)
–S. Kumar and M. Hebert. A hierarchical field
framework for unified context-based classification. 2005
–Jordan Reynolds and Kevin Murphy. Figure-ground
segmentation using a hierarchical conditional random
field. 2007
Tree Structured Conditional Random Fields (TCRF)
–P. Awasthi, A. Gagrani, and B. Ravindran, Image
Modeling using Tree Structured Conditional Random
Fields. 2007
Reference
Li, S. Z.: 2009, `Markov Random Field Modeling in
Image Analysis’. Springer, 2009
Kumar, S. and M. Hebert: 2003, `Discriminative
Random Fields: A Discriminative Framework for
Contextual Interaction in Classification'. in proc. IEEE
International Conference on Computer Vision (ICCV)
He, X., R. Zemel, and M. Carreira-Perpinan: 2004,
`Multiscale conditional random fields for image
labelling'. IEEE Int. Conf. CVPR.
End
Thanks!