Saliency Map

Download Report

Transcript Saliency Map

黃文中
2009-05-04
1
Preview
2
3
 The Saliency Map is a topographically arranged
map that represents visual saliency of a
corresponding visual scene.
4
 Two kinds of stimuli type:
 Bottom-up

Depend only on the instantaneous sensory input
 Without taking into account the internal state of the
organism
 Top-down
 Take into account the internal state
 Such as goals the organisms has at this time, personal
history and experiences, etc
5
6
 Nine spatial scales are created using dyadic Gaussian





pyramids.
Each features is computed by a set of linear “centersurround” operations akin to visual receptive fields.
Normalization
Across-scale combination into three “conspicuity
maps.”
Linear combinations to create saliency map.
Winner-take-all
7
8
9
A Method of Calculating Image Saliency
and of Optimizing Efficient Distribution
of Image Windows
10
 In the learning process, feature vectors
are extracted at all of positions of
numerous learning images. Then PCA is
applied to them to produce principal
basis vectors, which are memorized in
the dictionary.
 In the searching step, the feature vector
is extracted from each position of the
input image, and it is expanded on the
basis memorized in the dictionary.
Then the coefficients in the expansion
are analyzed by another PCA, and the
residual when the feature vector is
expanded on the principal basis in the
second PCA is output as the saliency.
11
 Let u be a d = 3 × m × n dimensional vector which consists
of R, G and B pixel values of m × n pixels in a window of m
× n size.
 The Principal Component Analysis is applied to a set of N
feature vectors u1, u2, · · ·, uN.
 When the image size is W × H, the number of feature
vectors is N = K × (W − m + 1) × (H − n + 1).
 Let u be an average of feature vectors
and let C be a covariance matrix defined as
12
 Let λ1 ≥ λ2 ≥ · · · ≥ λN be eigenvalues of the covariance
matrix C, and let ξ1, ξ2, · · · , ξd be corresponding
eigenvectors.
 These vectors are assumed to form an orthonormal
basis, and among them S vectors ξ1, ξ2, · · · , ξS are
memorized in the dictionary as the bases of principal
components.
13
 A window of m × n size is scanned in the same way as
in the learning step, and a feature vector vi {i = 1,
2, · · ·,M} is extracted at each point, where
M = (W−m+1)×(H−n+1) is a number of points.
 Then the vector Vi = vi −〈u〉 is expanded on
the ”memorized” basis as
14
 Then the set of coefficients αi1, αi2, · · · , αiS is regarded
as a S dimensional vector αi, and PCA is applied again
to them.
 The average of α1,α2, · · · ,αM is calculated as
 Then the eigenvalues μ1 ≥ μ2 ≥ ··· ≥ μS and the orth-
normal eigenvectors φ1,φ2, · · · ,φS are obtained just as
in the learning step.
15
 Now we define another orthonormal basis vector
 The residue, which represents the saliency, is obtained
by
16
17
 First initial distribution of L points is determined randomly with
uniform probability.
 Let (xm1, ym1), (xm2, ym2), · · ·, (xmL, ymL) be the distribution in
the mth iteration.
 Feature vectors are extracted from these L points, and the values
of saliency sm1, sm2, · · ·, smL are calculated in the same way as
previous discuss.
 Then such points are selected as have greater saliency than a
predetermined value th.
 Let H be the number of the selected points, and let (xm1,
ym1 ), (xm2, ym2 ), · · ·, (xmH, ymH ) be the selected points.
18
 Next the potential energy E(x, y) is calculated as
 Then (m+1)th distribution is determined stochastically according
to Gibbs distribution
 Z is the partition function defined as
19
20
 The model integrates:
 A bottom-up mechanism for extracting features to
obtain salient information
 A top-down perceptual mechanism for perceiving face
features such as face form and face color.
21
22
 The secondary visual areas deal with form and color of
an object, 3-D position and motion information.
 The face-selective cells in the infero-temporal (IT) area
contain complex shape coding information.
 The neurons in area V4 respond best to specific colors
of objects, irrespective of lighting conditions.
23
24
25
 Suppose that the saliency map is one of the results of
redundancy reduction in our brain.
 Use ICA to model the role of the visual cortex because
the ICA is the best way to reduce the redundancy.
26
27
 Eri is obtained by the convolution between the r-th channel of
input image (Ir) and the i-th filters (ICsri) obtained by the ICA
learning as shown below:
 The feature map, Eri represents the influences of the three
channel images on each independent component.
 A saliency map is obtained by
 A salient location P is the maximum summation value in a
specific window of a saliency map as shown below:
28
29
 Use an auto-associative multilayer perceptron(AAMLP) with 4-layers
which are mapping layer, bottleneck layer, de-mapping layer, and
output layer.
 An auto-associative neural network is basically a neural network whose
input and target vectors are the same.
30
 Modeling the top-down perception mechanism in the IT and V4
areas using AAMLP by which characteristic information, such as
face form and face color, is trained and memorized in the
connections of the artificial neurons in AAMLP.
 Also, a human being can perceive some important characteristic
information for a specific object rather than very detailed
information.
 To mimic this role as well as computational efficiency, we
extracted some eigenvectors with large eigenvalues using a
principal component analysis (PCA) for extracting some
important features of a specific object.
 To perceive a face related information, we mimic the retrieval of
face-related information from AAMLP using correlation
computation between input and output of the AAMLP.
31
 Let F denotes an auto-associative mapping function,
and xj and yj indicate an input and output vector,
respectively. Then the function F is usually trained to
minimize the following mean square error given by
32
 From the top-down processing, we get the face shape
feature map and the face color feature map.
 The synchronization for a biological binding process
of different features is modeled by the summation of
pixel values in the face form feature map, the face color
feature map, and the bottom-up SM.
33
34

[1] T. Toriu and S. Nakajima, "A Method of Calculating Image Saliency and of Optimizing
Efficient Distribution of ImageWindows," Innovative Computing, Information and
Control, 2006. ICICIC '06. First International Conference on, vol. 1, pp. 290-293, 2006.

[2] S. Ban, M. Lee and H. Yang, "A face detection using biologically motivated bottom-up
saliency map model and top-down perception model," Neurocomputing, vol. 56, pp. 475480, 1. 2004.

[3] A. J. Bell and T. J. Sejnowski, "The 「independent components」 of natural scenes are
edge filters," Vision Res., vol. 37, pp. 3327-3338, 12. 1997.

[4] S. Park, K. An and M. Lee, "Saliency map model with adaptive masking based on
independent component analysis," Neurocomputing, vol. 49, pp. 417-422, 12. 2002.
35