Transcript v 1

Transform-based Models
Principal component analysis (PCA) or
Karhunen-Loeve transform (KLT)
Application into Face Recognition and
MATLAB demo
DFT, DCT and Wavelet transforms
Statistical modeling of transform coefficients
(sparse representations)
Application into Texture Synthesis and
MATLAB demos
EE565 Advanced Image Processing Copyright
Xin Li 2008
1
PCA/KLT
What is principal components?
direction of maximum variance in the input
space (physical interpretation)
principal eigenvector of the covariance matrix
(mathematical definition)
Theoretic derivations (This is not a Theory
Course like EE513)
There exist several different approaches in the
literature of statistics, economics and
communication theory
EE565 Advanced Image Processing Copyright
Xin Li 2008
2
Standard Derivation (Covariance
method)
(unitary condition, refer to EE465)
Basic idea: Diagonalization
=
EE565 Advanced Image Processing Copyright
Xin Li 2008
3
Geometric Interpretation (direction of
maximum variation/information)
EE565 Advanced Image Processing Copyright
Xin Li 2008
4
Why Does PCA/KLT Make Sense?
 In Pattern Recognition (e.g., R. Duda’s textbook “Pattern
Classification”) or in Signal Processing (e.g., S. Mallat’s
textbook “A Wavelet Tour of Signal Processing”)
Analytical results are available for stationary
Gaussian processes except the unknown parameters
(low-order statistics)
Classical ML/Bayes parameter estimation works most
effectively under the independence assumption (recall
the curse of dimensionality)
Transform facilitates the satisfaction of this
assumption
 In Economics, Google “Hotteling transform”
EE565 Advanced Image Processing Copyright
Xin Li 2008
5
Example: Transform Facilitates Modeling
x2
y1
y2
x1
x1 and x2 are highly correlated
p(x1x2)  p(x1)p(x2)
y1 and y2 are less correlated
p(y1y2)  p(y1)p(y2)
EE565 Advanced Image Processing Copyright
Xin Li 2008
6
Comparison Between LR and LT
 Linear regression (AR
model)
Hyperplane fitting (a
local strategy)
Dimensionality
reduction: data space
mapped to parameter
space
Distortion not
preserved (refer to
EE467 closed-loop
opt. in speech coding)
 Linear transform
(PCA/KLT)
Rotation of coordinate
(a global strategy)
Dimensionality
reduction: only
preserve the largest
eigenvalues in the
data space
Preserves distortion
(unitary property of P)
EE565 Advanced Image Processing Copyright
Xin Li 2008
7
Transform-based Models
Principal component analysis (PCA) or
Karhunen-Loeve transform (KLT)
Application into Face Recognition and
MATLAB demo
DFT, DCT and Wavelet transforms
Statistical modeling of transform coefficients
(sparse representations)
Application into Texture Synthesis and
MATLAB demos
EE565 Advanced Image Processing Copyright
Xin Li 2008
8
Appearance-based Recognition
(adapted from CMU Class 15385-s06)
• Directly represent appearance (image brightness), not geometry.
• Why?
Avoids modeling geometry, complex interactions
between geometry, lighting and reflectance.
• Why not?
Too many possible appearances!
m “visual degrees of freedom” (eg., pose, lighting, etc)
R discrete samples for each DOF
“nature is economical of structures but of principles”
–Abdus Salam
The Space of Faces
=
+
 An image with N pixels is a point in N-dimensional space
 A collection of M images is a cloud of M point in RN
 We can define vectors in this space as we did in the 2D case
[Apologies to former President Bush]
Key Idea: Linear Subspace
• Images in the possible set
P
  {xˆ RL
}are highly correlated.
• So, compress them to a low-dimensional linear subspace that
captures key appearance characteristics of the visual DOFs.
Linearity assumption is a double-bladed sword: it facilitates
analytical derivation and computational solution but
Nature seldom works in a linear fashion
• EIGENFACES: [Turk and Pentland’1991]
USE PCA!
Example of Eigenfaces
Training set of face images.
Eigenfaces look somewhat
like ghost faces.
15 principal components
(eigenfaces or eigenvectors
corresponding to 15 largest
eigenvalues).
Linear Subspaces explained by 2D
Toy Example (Easier for Visualization)
convert x into v1, v2 coordinates
What does the v2 coordinate measure?
- distance to line
- use it for classification—near 0 for orange pts
What does the v1 coordinate measure?
- position along line
- use it to specify which orange point it is
 Classification can be expensive
 Must either search (e.g., nearest neighbors) or store
large probability density functions.
• Suppose the data points are arranged as above
– Idea—fit a line, classifier measures distance to line
Dimensionality Reduction
• Dimensionality reduction
– We can represent the orange points with only their v1 coordinates
• since v2 coordinates are all essentially 0
– This makes it much cheaper to store and compare points
– A much bigger deal for higher dimensional problems
Linear Subspaces (PCA in 2D)
Consider the variation along direction v
among all of the orange points:
What unit vector v minimizes var?
What unit vector v maximizes var?
Solution: v1 is eigenvector of A with largest eigenvalue
v2 is eigenvector of A with smallest eigenvalue
PCA in Higher Dimensions
 Suppose each data point is N-dimensional
 Same procedure applies:
 The eigenvectors of A define a new coordinate system
 eigenvector with largest eigenvalue captures the most variation
among training vectors x
 eigenvector with smallest eigenvalue has least variation
 We can compress the data by only using the top few
eigenvectors
 corresponds to choosing a “linear subspace”
• represent points on a line, plane, or “hyper-plane”
 these eigenvectors are known as the principal components
Problem: Size of Covariance Matrix A
 Suppose each data point is N-dimensional (N pixels)
2
 The size of covariance matrix A is N x N
 The number of eigenfaces is N
2
 Example: For N = 256 x 256 pixels,
Size of A will be 65536 x 65536 !
Number of eigenvectors will be 65536 !
Typically, only 20-30 eigenvectors suffice. So, this
method is very inefficient!
Efficient Computation of Eigenvectors*
(You can skip it if you don’t like matrix)
If B is MxN and M<<N then A=BTB is NxN >> MxM
M  number of images, N  number of pixels
use BBT instead, eigenvector of BBT is easily
converted to that of BTB
(BBT) y = y
=> BT(BBT) y =  (BTy)
=> (BTB)(BTy) =  (BTy)
=> BTy is the eigenvector of BTB
Eigenfaces – summary in words
 Eigenfaces are
the eigenvectors of
the covariance matrix of
the probability distribution of
the vector space of
human faces
 Eigenfaces are the ‘epitomized face ingredients’ derived
from the statistical analysis of many pictures of human
faces
 A human face may be considered to be a combination of
these epitomized faces
Generating Eigenfaces – in words
1. Large set of images of human faces is taken.
2. The images are normalized to line up the eyes,
mouths and other features.
3. The eigenvectors of the covariance matrix of
the face image vectors are then extracted.
4. These eigenvectors are called eigenfaces.
5. Well – please also keep in mind that if BTB is
too large, you can use BBT instead (an
algebraic trick)
Eigenfaces for Face Recognition
 When properly weighted, eigenfaces can be
summed together to create an approximate grayscale rendering of a human face.
 Remarkably few eigenvector terms are needed to
give a fair likeness of most people's faces.
 Hence eigenfaces provide a means of applying
“data compression” to faces for identification
purposes (note NOT for transmission purpose).
Detection with Eigenfaces
The set of faces is a “subspace” of the set
of images
 Suppose it is K dimensional
 We can find the best subspace using
PCA
 This is like fitting a “hyper-plane” to the
set of faces
 spanned by vectors v1, v2, ..., vK
Any face:
Eigenfaces
 PCA extracts the eigenvectors of A
 Gives a set of vectors v1, v2, v3, ...
 Each one of these vectors is a direction in face space
 what do these look like?
Projecting onto the Eigenfaces
(it is easier to understand projection by using the 2D toy
example though conceptually high-D works the same way)
 The eigenfaces v1, ..., vK span the space of faces
A face is converted to eigenface coordinates by
Key Property of Eigenspace Representation
Given
• 2 images
•
•
Then,
xˆ1 , xˆ2
that are used to construct the Eigenspace
ĝ1 is the eigenspace reconstruction of image x̂1
ĝ 2 is the eigenspace reconstruction of image x̂2
|| gˆ 2  gˆ1 ||  || xˆ2  xˆ1 ||
That is, distance in Eigenspace is approximately equal to the
correlation between two images.
Advantage of Dimensionality
Reduction
xRHxW →a RK
Training set: x1 ,x2 ,…,xM → a1 ,a2 ,…,aM
New image: x → a
For detection: thresholding d=mean(||aak||)
For recognition: select the index
minimizing ||a-ak||
EE565 Advanced Image Processing Copyright
Xin Li 2008
26
Detection: Is this a face or not?
Recognition: Whose Face is This?
 Algorithm
1. Process the image database (set of images with labels)
 Run PCA—compute eigenfaces
 Calculate the K coefficients for each image
2. Given a new image (to be recognized) x, calculate K
coefficients
3. Detect if x is a face
4. If it is a face, who is it?
• Find closest labeled face in database
• nearest-neighbor in K-dimensional space
An Excellent Toy Example to Help Your Understanding
(M’ is the number of eigenfaces used)
Choosing the Dimension K
eigenvalues
i=
K
NM
 How many eigenfaces to use?
 Look at the decay of the eigenvalues
the eigenvalue tells you the amount of variance “in
the direction” of that eigenface
ignore eigenfaces with low variance
New Ideas We can Play with
A localized version of eigenfaces-based
recognition
For each subject k=1,…,K, obtain its associated
N eigen-faces vk1 ,…, vkN
For a new image x, project it onto all K sets of
“localized” eigen-face spaces (so we can obtain
K reconstructed copies x^1,…, x^K)
Select the one minimizing ||x-x^k||
 Connection with sparse representation (or l1 regularization)
- refer to Prof. Yi Ma’s homepage and his new PAMI paper
Transform-based Models
Principal component analysis (PCA) or
Karhunen-Loeve transform (KLT)
Application into Face Recognition and
MATLAB demo
DFT, DCT and Wavelet transforms
Statistical modeling of transform coefficients
(sparse representations)
Application into Texture Synthesis and
MATLAB demos
EE565 Advanced Image Processing Copyright
Xin Li 2008
35
Alternative Tools (more suitable for
Telecommunication Applications)
Discrete Fourier Transform
Can be shown to be KLT for circular stationary
process (eigen-vectors take the form of discrete
Fourier basis)
Discrete Cosine Transform (DCT)
Good approximation of KLT for AR(1) with highcorrelation (e.g., a=0.9) – you are asked to
show this in HW#1
Wavelet Transform
Effective tool for characterizing transient signals
EE565 Advanced Image Processing Copyright
Xin Li 2008
36
One-Minute Tour of Wavelet
1
0.8
0.6
0.4
0.2
0
x(n)
-0.2
-0.4
-0.6
-0.8
-1
0
20
40
60
80
1.5
100
120
140
160
180
200
0.04
0.03
1
0.02
0.5
0.01
s(n)
0
d(n)
0
-0.01
-0.5
-0.02
-1
-0.03
-1.5
0
20
40
60
80
100
-0.04
0
20
40
EE565 Advanced Image Processing Copyright
Xin Li 2008
60
80
100
37
Wavelet Transform on Images
s(m,n)
d(m,n)
After row transform:
each row is decomposed into
low-band (approximation) and
high-band (detail)
LL
HL
LH
HH
Note that the order of row/column
transform does not matter
EE565 Advanced Image Processing Copyright
Xin Li 2008
38
From one-level to multi-level
EE565 Advanced Image Processing Copyright
Xin Li 2008
39
Relationship to Linear Regression
(Sparsity Perspective)
In AR model, the effectiveness is
measured by the energy (sparsity) of
prediction errors
In transform-based models, the
effectiveness is measured by the energy
(sparsity) of transform coefficients
Improved sparsity (or lower energy)
implies a better match between the
assumed model and observation data
EE565 Advanced Image Processing Copyright
Xin Li 2008
40
Empirical Observation
Extract
HL band
After WT
1000
1000
900
900
800
800
700
700
600
600
500
500
400
400
300
300
200
200
100
100
0
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
A single peak at zero
EE565 Advanced Image Processing Copyright
Xin Li 2008
41
Univariate Probability Model
Laplacian:
Gaussian:
EE565 Advanced Image Processing Copyright
Xin Li 2008
42
Gaussian Distribution
EE565 Advanced Image Processing Copyright
Xin Li 2008
43
Laplacian Distribution
EE565 Advanced Image Processing Copyright
Xin Li 2008
44
Statistical Testing
 How do we know which parametric model better
fits the empirical distribution of wavelet
coefficients?
 In addition to visual inspection (which is often
subjective and less accurate), we can use
various statistical testing tools to objectively
evaluate the closeness of an empirical
cumulative distribution function (ECDF) to the
hypothesized one
 One of the most widely used techniques is
Kolmogorov-Smirnov Test (MATLAB function:
>help kstest).
EE565 Advanced Image Processing Copyright
Xin Li 2008
45
Kolmogorov-Smirnov Test*
The K-S test is based on the maximum distance between empirical CDF
(ECDF) and hypothesized CDF (e.g., the normal distribution N(0,1)).
EE565 Advanced Image Processing Copyright
Xin Li 2008
46
Example
Usage: [H,P,KS,CV] = KSTEST(X,CDF)
If CDF is omitted, it assumes pdf of N(0,1)
Accept the hypothesis
Reject the hypothesis
x: computer-generated samples
(0<P<1, the larger P, the more likely)
d: high-band wavelet coefficients
of lena image (note the normalization
by signal variance)
EE565 Advanced Image Processing Copyright
Xin Li 2008
47
Generalized Gaussian/Laplacian
Distribution
P: shape parameter
: variance parameter
where
Laplacian
Gaussian
EE565 Advanced Image Processing Copyright
Xin Li 2008
48
Model Parameter Estimation*
Maximum Likelihood Estimation
Method of moments
Linear regression method
Ref.
[1] Sharifi, K. and Leon-Garcia, A.
“Estimation of shape parameter for generalized Gaussian
distributions in subband decompositions of video,”
IEEE T-CSVT, No. 1, February 1995, pp. 52-56.
[2] www.cimat.mx/reportes/enlinea/I-01-18_eng.pdf
EE565 Advanced Image Processing Copyright
Xin Li 2008
49
Transform-based Models
Principal component analysis (PCA) or
Karhunen-Loeve transform (KLT)
Application into Face Recognition and
MATLAB demo
DFT, DCT and Wavelet transforms
Statistical modeling of transform coefficients
(sparse representations)
Application into Texture Synthesis and
MATLAB demos
EE565 Advanced Image Processing Copyright
Xin Li 2008
50
Wavelet-based Texture Synthesis
Basic idea: two visually similar textures will also have similar statistics
 Pyramid-based (using steerable pyramids)
Facilitate the statistical modeling
 Histogram matching
Enforce the first-order statistical constraint
 Texture matching
Alternate histogram matching in spatial and wavelet
domain
 Boundary handling: use periodic extension
 Color consistency: use color transformation
EE565 Advanced Image Processing Copyright
Xin Li 2008
51
Histogram Matching
Generalization of histogram equalization (the target is the histogram
of a given image instead of uniform distribution)
EE565 Advanced Image Processing Copyright
Xin Li 2008
52
Histogram Equalization
x
y  L   h(t )
t 0
Uniform
Quantization
L
x
y s   h(t )
t 0
L
0
1
Note:
 h(t )  1
t 0
cumulative probability function
L
EE565 Advanced Image Processing Copyright
Xin Li 2008
x
53
MATLAB Implementation
function y=hist_eq(x)
[M,N]=size(x);
for i=1:256
h(i)=sum(sum(x= =i-1));
End
y=x;s=sum(h);
for i=1:256
I=find(x= =i-1);
y(I)=sum(h(1:i))/s*255;
end
Calculate the histogram
of the input image
Perform histogram
equalization
EE565 Advanced Image Processing Copyright
Xin Li 2008
54
Histogram Equalization Example
EE565 Advanced Image Processing Copyright
Xin Li 2008
55
Histogram Specification
histogram1
histogram2
S-1*T
T
S
?
EE565 Advanced Image Processing Copyright
Xin Li 2008
56
Texture Matching
Objective: the histogram of both subbands and synthesized image
matches the given template
Basic hypothesis: if two texture images visually look similar, then they
have similar histograms in both spatial and wavelet domain
EE565 Advanced Image Processing Copyright
Xin Li 2008
57
Image Examples
EE565 Advanced Image Processing Copyright
Xin Li 2008
58
I.I.D. Assumption Challenged
If wavelet coefficients of each subband
are indeed i.i.d., then random
permutation of pixels should produce
another image of the same class
(natural images)
The fundamental question here: does
WT completely decorrelate image
signals?
EE565 Advanced Image Processing Copyright
Xin Li 2008
59
Image Example
High-band
coefficients
permutation
You can run the MATLAB demo to check this experiment
EE565 Advanced Image Processing Copyright
Xin Li 2008
60
Another Experiment
5
4
3
2
1
Y
0
-1
-2
-3
-4
-4
-3
-2
-1
0
X
1
2
3
4
Joint pdf of two uncorrelated random variables X and Y
EE565 Advanced Image Processing Copyright
Xin Li 2008
61
Joint PDF of Wavelet Coefficients
Y=
X=
Joint pdf of two correlated
random variables X and Y
Neighborhood I(Q): {Left,Up,cousin and aunt}
EE565 Advanced Image Processing Copyright
Xin Li 2008
62
Texture Synthesis via Parametric
Models in the Wavelet Space
Basic idea: two visually similar textures will also have similar statistics
Instead of matching histogram (nonparametric models), we can build
parametric models for wavelet coefficients and enforce the synthesized
image to inherit the parameters of given image
Model parameters: 710 parameters were used in Portilla and Simoncelli’s
experiment (4 orientations, 4 scales, 77 neighborhood)
EE565 Advanced Image Processing Copyright
Xin Li 2008
63
Statistical Constraints
Four types of constraints
Marginal Statistics
Raw coefficient correlation
Coefficient magnitude statistics
Cross-scale phase statistics
Alternating Projections onto the four
constraint sets
Projection-onto-convex-set (POCS)
EE565 Advanced Image Processing Copyright
Xin Li 2008
64
Convex Set
x  , y  
ax  (1  a) y  , 0  a  1
A set Ω is said to be convex if for any two point
We have
Convex set examples
Non-convex set examples
EE565 Advanced Image Processing Copyright
Xin Li 2008
65
Projection Operator
Projection onto convex set C
f
g
C
g  Pf  {x  C || x  f || min || x  f ||}
xC
In simple words, the projection of f onto a convex set C is the
element in C that is closest to f in terms of Euclidean distance
EE565 Advanced Image Processing Copyright
Xin Li 2008
66
Alternating Projection
C1
X1
X∞
X0
X2
C2
Projection-Onto-Convex-Set (POCS) Theorem: If C1,…,Ck are
convex sets, then alternating projection P1,…,Pk will converge
to the intersection of C1,…,Ck if it is not empty
Alternating projection does not always converge in the case
of non-convex set. Can you think of any counter-example?
EE565 Advanced Image Processing Copyright
Xin Li 2008
67
Convex Constraint Sets
● Non-negative set
{ f | f  0}
● Bounded-value set
{ f | 0  f  255}
or
{ f | A  f  B}
● Bounded-variance set
{ f || f  g ||2  T }
A given signal
EE565 Advanced Image Processing Copyright
Xin Li 2008
68
Non-convex Constraint Set
Histogram matching used in
Heeger&Bergen’1995
Bounded Skewness and Kurtosis
skewness
kurtosis
The derivation of projection operators onto constraint sets are tedious
are referred to the paper and MATLAB codes by Portilla&Simoncelli.
EE565 Advanced Image Processing Copyright
Xin Li 2008
69
Image Examples
original
synthesized
EE565 Advanced Image Processing Copyright
Xin Li 2008
70
Image Examples (Con’d)
original
synthesized
EE565 Advanced Image Processing Copyright
Xin Li 2008
71
When Does It Fail?
original
synthesized
EE565 Advanced Image Processing Copyright
Xin Li 2008
72
Summary
Textures represent an important class of
structures in natural images – unlike
edges characterizing object boundaries,
textures often associate with the
homogeneous property of object surfaces
Wavelet-domain parametric models
provide a parsimonious representation of
high-order statistical dependency within
textural images
EE565 Advanced Image Processing Copyright
Xin Li 2008
73