Optical Flow (Reminder)

Download Report

Transcript Optical Flow (Reminder)

Learning Optical Flow
Goren Gordon
and
Emanuel Milman
Advanced Topics in Computer Vision
May 28, 2006
After Roth and Black:
• On the Spatial Statistics of Optical Flow, ICCV 2005.
• Fields of Experts: A Framework for Learning Image Priors, CVPR 2005.
Overview
• Optical Flow Reminder and Motivation.
• Learning Natural Image Priors:
• Product of Experts (PoE).
• Markov Random Fields (MRF).
• Fields of Experts (FoE) = PoE + MRF.
• Training FoE:
• Markov Chain Monte Carlo (MCMC).
• Contrastive Divergence (CD).
• Applications of FoE:
• Denoising.
• Inpainting.
• Optical Flow Computation.
Optical Flow
(Reminder from last week)
…
(taken from Darya and Denis’s presentation)
Optical Flow  Reminder
Optical Flow (Reminder)
I(x,y,t) = Sequence of Intensity Images.
Brightness Constancy Assumption under optical flow field (u,v):
I ( x  u, y  v, t  1)  I ( x, y, t )
First order Taylor approximation -Optical Flow Constraint Equation:
+
I x u  I y v I t  0
=
Partial derivatives
frame #2
Aperture frame
Problem:
one equation,
#1
flow field two unknowns.
Can only determine the normal flow =
component
(u,v) and
parallel
to (I
x,Iy).
(images
taken fromofDarya
Denis’s
presentation)
Optical Flow  Reminder
Finding Optical Flow (Reminder)
Local Methods (Lucas-Kanade) – assume (u,v) is locally constant:
ELK : K  ( I xu  I y v  I t ) 2
- Pros: robust under noise.
- Cons: if image is locally constant, need interpolation steps.
Global Methods (Horn-Schunck) – use global regularization term:
EHS :
 ( I u  I v  I )
x
y
t
2

  (| u |2  | v |2 ) dxdy
Spatial
- Pros: automatic filling-in in places where image is constant.
- Cons: less robust under noise.
Combined Local-Global Method (Weickert et al.)
ECLG :
 K
Spatial


 ( I x u  I y v  I t ) 2   (| u |2  | v |2 ) dxdy
Optical Flow  Reminder
CLG Energy Functional
ECLG :
 K


 ( I x u  I y v  I t ) 2   (| u |2  | v |2 ) dxdy
Spatial
w : (u, v,1)T
| w |2 : | u |2  | v |2
I : ( I x , I y , I t )T
J  ( I ) : K  (I I T )
Kσ – smoothing kernel (spatial or spatio-temporal):
E2 D CLG ( w) :
 w J
T
Spatial
E3 D CLG ( w) :

2
(
I
)
w


|

w
|
dxdy

 w J
T
SpatialTemporal

2
(
I
)
w


|

w
|
dxdydt

Optical Flow  Motivation
Spatial Regularizer - Revisited
 


TT
2
ECLG
(
w
)
:


(
w
J
(
I
)
)


(|

wdxdy
|) dxdy
(
w
)
:

w
J
(
I
)
w


|

w
|
S
2 D CLG

 D
SpatialSpatial
ρD, ρS - quadratic  robust (differentiable) penalty functions.
Motivation: why use
 S (| w |)
?
Answer: Optical-flow is piecewise smooth; lets hope that
spatial term captures this behaviour.
Questions:
• Which ρS to use? Why are some functions better than others?
• Maybe more information in w than first order | w | ?
• Maybe
w( x, y ) , w( x  1, y ) are dependant ?
Optical Flow  Motivation
Learning Optical Flow
Roth and Black, “On the Spatial Statistics of Optical Flow”, ICCV 2005.
Idea: learn (from training set) prior distribution on w, and use its
energy-functional as spatial-term!
ECLG ( w) :
  (
D

wT J  ( I ) w )   S (| w |) dxdy
Spatial
First-order selected prior
E ( w) :
T

(
w
J  ( I ) w ) dxdy  EFoE ( w)
 D
Spatial
Higher-order learned prior
FoE = Fields of Experts
Optical Flow  Motivation
Fields of Experts (FoE)
Fields of Experts = Product of Experts + Markov Random Fields
(FoE)
(PoE)
(MRF)
Roth and Black, “Fields of Experts: A framework …”, CVPR 2005.
Model rich prior distributions for natural images.
Many applications:
• Denoising. √
• Inpainting. √
• Segmentation.
• more…
Detour: review FoE model on natural images.
Natural Images
Natural Images 
Modeling Natural Images
Challenging:
• High dimensionality ( |Ω| ≥10000 ).
• Non-Gaussian statistics (even simplest models assume MoG).
• Need to model correlations in image structure over
extended neighborhoods.
Natural Images  Observations
Observations (Olshausen, Field, Mumford, Simoncelli, etc..)
• Many linear filters have non-Gaussian responses:
concentrated around 0 with “heavy tails”.
www.cvgpr.uni-mannheim.de/heiler/natstat
Natural Images  Observations
Observations (Olshausen, Field, Mumford, Simoncelli, etc..)
• Many linear filters have non-Gaussian responses:
concentrated around 0 with “heavy tails”.
• Responses of different filters are usually not independent.
• Statistics of image pixels are higher-order than pair-wise correlations.
Natural Images  Image Patches
Modeling Image Patches
• Example-based learning (Freeman et al.) –
use measure of consistency between image patches.
• FRAME (Zhu, Wu and Mumford) – use hand selected filters and
discretized histograms to learn image prior for texture modeling.
• Linear models: n-dim patch x is stochastic linear combination
of m basis patches {Ji}.
m
x   ai J i
i 1
Natural Images  Image Patches
Linear Patch Models
m
n dim patch
x   ai J i
i 1
1. PCA – if ai are Gaussian (decompose CoVar(x) into eigenvectors).
(Non-realistic.)
2. ICA – if ai are independent non-Gaussian and n=m.
(Generally impossible to find n independent basis patches.)
3. Sparse Coding (Olshausen and Field) – use m>n and
assume ai are highly concentrated around 0, to derive sparse
representation model with an over-complete basis.
(Need computational inference step to calculate ai.)
4. Product of Experts = PoE (Hinton).
Product of Experts
X
X
X
=?
Natural Images  Image Patches  Product of Experts
Product of Experts (PoE)
• Model high-dim distributions as product of low-dim expert distributions.
subspace
m
p( x | 1  m ) 
 p (x | )
i
i 1
m
i
  pi ( x | i )dx
x – data
θi – i’th expert’s parameter
i 1
• Each expert works on a low(1)-dim subspace - easy to model.
• Parameters {θi} can be learned on training sequence.
• PoEs produce sharper and expressive distributions than individual
expert models (similar to Boosting techniques).
• Very compact model compared to mixture-models (like MoG).
Natural Images  Image Patches  Product of Experts
PoE Examples
• General framework, not restricted to CV applications.
• Sentences:
– One expert can ensure that tenses agree.
– Another expert can ensure that subject and verb agree.
– Grammar expert.
– Etc…
• Handwritten digits:
– One set of experts can model the overall shape of digit.
– Another set of experts can model the local stroke structure.
User written
Given ‘7’
prior
Mayraz and Hinton
Given ‘9’
prior
Natural Images  Image Patches  Product of Experts
Product of Student-T (PoT)
• Filter responses on images - concentrated, heavy tailed distributions.
• Welling, Hinton et al “Learning … with product of Student-t distributions”, 2003.
Model with Student-t:
 t 
 (t ;  )  1  
2

2
Polynomial tail decay!

Natural Images  Image Patches  Product of Experts
Product of Student-T (PoT)
 t 
 (t ;  )  1  
2

2
x

J1
JN
…
p(x; ) 
 ( J1T x;1 )   
Z ( )
 ( J NT x;  N )
Natural Images  Image Patches  Product of Experts
Product of Student-T (PoT)
 t 
 (t ;  )  1  
2

2
N
1
T
p( x; ) 

(
J

i
i x;  i )
Z () i 1
Z ( )
  { 1,, N }  i  {J i ,  i }
Partition function Parameters -
In Gibbs form:
1
p( x; ) 
exp(  EPoE ( x; ))
Z ()
N
EPoE ( x; )   log i ( J iT x;  i )
i 1

Natural Images  Image Patches  Product of Experts
PoE Training Set
~60000 5*5 patches randomly cropped from Berkely
Segmentation Benchmark DB.
Natural Images  Image Patches  Product of Experts
PoE Learned Filters
• Will discuss learning procedure in FoE model.
• 5*5-1=24 filters Ji were learned (no DC filter):
• Gabor-like filters accounting for local edge structures.
• Same characteristics when training more experts.
• Results are comparative to ICA.
Natural Images  Image Patches  Product of Experts
PoE – Final Thoughts
• PoE permits fewer, equal or more experts than dimension.
• Over-complete case allows dependencies between different
filters to be modeled, and thus more expressive than ICA.
• Product structure forces the learned filters to be “as independent
as possible”, capturing different characteristics of patches.
• Contrary to example-based approaches, the parametric
representation generalizes better and beyond the training data.
Back to Entire Images
Natural Images 
From Patches to Images
Extending former approach to entire images is problematic:
• Image-size is too big. Need huge number of experts.
• Model would depend on particular image-size.
• Model would not be translation-invariant.
Natural model for extending local patch model to entire
image: Markov Random Fields.
Markov Random Fields
(just 2 slides!)
Natural Images  Markov Random Fields
Markov Random Fields (MRF)
G  (V , E )
v V  X v r.v.
( X v1 ,  , X vn ) have joint distribution P.
( X v1 ,  , X vn )
is a Markov Random Field on G if:
P({ X v }vS | { X w }wS )  P({ X v }vS | { X w }wN (S ) )
N(S) = {neighbors of S} \ S
Natural Images  Markov Random Fields
Gibbs Distributions
Hammersley-Clifford Theorem:
X  ( X v1 ,  , X vn ) is a MRF with P>0 iff P is a Gibbs distribution.
P is a Gibbs distribution on X if:
1


P( X  x)  exp   Vc ({xv }vc ) 
Z
 cC

C = set of all maximal cliques (complete sub-graphs) in G.
Vc = potential associated to clique c.
Connects local property (MRF) with global property (Gibbs dist.)
Fields of Experts
Natural Images  Fields of Experts
Fields of Experts (FoE)
Fields of Experts = Product of Experts + Markov Random Fields
(FoE)
(PoE)
(MRF)
MRF: V = image lattice, E = connect all nodes in m*m patch x(k) .
1


p( x)  exp   Vk ( x(k ) ) 
Z
 k

Overlapping
Make model translation invariant: Vk = W.
Model potential W using a PoE:
N
W ( x( k ) )  EPoE ( x( k ) ; )   log i ( J x( k ) ;  i )
i 1
T
i
Vk
Natural Images  Fields of Experts
FoE Density
N
1
1
T
p( x; ) 
i ( J i x( k ) ;  i ) 
exp(  EFoE ( x; ))

Z () k i 1
Z ( )
N
EFoE ( x; )   log i ( J iT x( k ) ;  i )
k
i 1
• Other MRF approaches typically use hand selected clique
potentials and small neighborhood systems.
• In FoE, translation invariant potential W is directly learned from
training images.
• FoE = density is combination of overlapping local experts.
(MRF)
(PoE)
Natural Images  Fields of Experts
FoE Model Pros
• Overcomes previously mentioned problems:
- Parameters Θ depend only on patch’s dimensions.
- Applies to images of arbitrary size.
- Translation invariant by definition.
• Explicitly models overlap of patches, by learning
from training images.
• Overlapping patches are highly correlated;
learned filters Ji and αi must account for this 
Natural Images  Fields of Experts
Learned Filters
FoE
PoE
Training FoE
Natural Images  Training FoE
Training FoE
Given training-set X=(x1,…,xn), its likelihood is:
n
n
1
pFoE ( xi ; )  
exp(  EFoE ( xi ; ))

i 1
i 1 Z ()
Find Θ which maximize likelihood = minimize minus log-likelihood
1 n
LLFoE ( X ; )   EFoE ( xi ; )  log Z ()
n i 1
Difficulty: computation of Z(Θ) is severely intractable:
Z ()   exp(  EFoE ( x; )) dx
Natural Images  Training FoE
Gradient Descent
LLFoE ( X ; ) 1 n EFoE ( xi ; )  log Z ()
 

i
n i 1
i
i
 log Z ()
1 Z ()
1



exp(  EFoE ( x;)) dx 

 i
Z ()  i
Z ()  i
EFoE ( x; )
EFoE ( x; )
1

exp(  EFoE ( x;)) dx   
pFoE ( x; )dx

Z ()
 i
 i
LLFoE ( x; )
EFoE ( x; )

 i
 i
X
EFoE ( x; )

 i
p FoE ( ; )
X – empirical data distribution; pFoE – model distribution.
Conclusion: need to
 fcalculate
 p   f (<f>
x) pp,( even
x)dx if p is intractable.
Markov Chain Monte Carlo
(3 Slide Detour)
Natural Images  Training FoE  Markov Chain Monte Carlo
Markov Chain Monte Carlo
MCMC – method for generating sequence of random (correlated)
1
samples from an arbitrary density function p ( x)  q ( x) .
Z
Calculating q is tractable, p may be intractable.
Use: approximate f
p
1 k
  f ( xi ) where xi ~ p using MCMC.
k i 1
Developed by physicists in late 1940’s (Metropolis).
Introduced to CV community by Geman and Geman (1984).
Idea: build a Markov chain which converges from an arbitrary
distribution to p(x).
Pros: easy to mathematically prove convergence to p(x).
Cons: no convergence rate guaranteed; samples are correlated.
Natural Images  Training FoE  Markov Chain Monte Carlo
MCMC Algorithms
Metropolis Algorithm
• Select any initial position x0.
x*
• At iteration k:
xk+1
k
• Create new trial position x* = xk+∆x,
∆x ~ symmetric trial distribution.
q ( x*) p( x*)
r


• Calculate ratio
q ( xk ) p ( xk )
.
x0
x*
• If r≥1 or with probability r, accept: xk+1 = x*;
otherwise stay put: xk+1 = xk.
• Resulting distribution converges to p !!!
• Creates a Markov Chain since xk+1 depends only on xk.
• Trial distribution dynamically scaled to have fixed acceptance rate.
Natural Images  Training FoE  Markov Chain Monte Carlo
MCMC Algorithms
Other algorithms to build sampling Markov chain:
Gibbs Sampler (Geman and Geman):
• Vary only one coordinate of x at a time.
• Draw new value of xj from conditional p(xj | x1,..,xj-1,xj+1,..,xn) usually tractable when p is a MRF.
Hamiltonian Hybrid Monte Carlo (HMC):
• State of the art; very efficient.
• Details omitted.
Natural Images  Training FoE
Back to FoE Gradient Descent
 E ( x; )
LLFoE ( x; )
EFoE ( x; )
FoE
i  


i
i
i

p FoE (; )
Step size
 EFoE ( x; )

EFoE ( x; )
 


 i
 i

X
X0 



X

X0 = empirical data distribution (xi with probability 1/n).
Xm = distribution of MCMC (initialized by X0) after m iterations.
X∞ = MCMC converges to desired distribution
Contrastive
Divergence
EFoE
( x; )
Use (Hinton)
 i
X
pFoE ( ; ) .
 EFoE ( x; )


E
(
x
;

)
FoE
1  EFoE ( y j ; )



i
∞ using MCMC.
 
where
y
~
X
 i
j
Xm
X0 

k j 1
i i
k
Natural Images  Training FoE  Contrastive Divergence
Contrastive Divergence (CD)
 EFoE ( x; )
i   
 i

EFoE ( x; )

 i
Xm


X0 

Intuition: running MCMC sampler for few iterations from X0 draws
samples closer to target distribution X∞  enough to “feel” gradient.
Formal justification of “Contrastive Divergence” (Hinton):
Maximizing Likelihood p(X0|X∞) = Minimizing KL Divergence X0 || X∞
CD is (almost) equivalent to minimizing X0 || X∞ - Xm || X∞ .
Natural Images  Training FoE
FoE Training Implementation
• Size of training images should be substantially larger than patch
(clique) size to capture spatial dependencies of overlapping patches.
• Trained on 2000 randomly cropped 15*15 images (5*5 patch) from
50 images in Berkley Segmentation Benchmark DB.
• Learned 24 expert filters.
• FoE Training is computationally intensive but off-line feasible.
Natural Images  Training FoE
FoE Training – Question Marks
• Note that under the MRF model:
p(5*5 patch | rest of image) = p(5*5 patch | 13*13 patch \ 5*5 patch).
• Therefore we feel that:
-15*15 images are too small to learn MRF’s 5*5 clique potentials.
- Better to use 13*13-1 filters instead of 5*5-1.
• Details which were omitted:
- HMC details.
- Parameter values.
5
13
- Faster convergence
by whitening patch pixels
before computing
gradient updates.
15
Applications!
Natural Images  FoE Applications  General
E=
(data term)
+
(spatial term)
denoising
E = (noise)
+
(FoE term)
inpainting
E = (data term)
+
(FoE term)
optical flow
E = (local data term) +
(FoE term)
Natural Images  FoE Applications  Denoising
Field of Experts: Denoising
y
x
http://www.cs.brown.edu/~roth/
Natural Images  FoE Applications  Denoising
Field of Experts: adding noise

y  x  N 0, 
Noisy image
true image
x
2
Gaussian noise

y
Natural Images  FoE Applications  Denoising
Field of Experts: Denoising
Use the posterior probability distribution
Known noise distribution
Distribution of Image using Prior Experts
Learned
Natural Images  FoE Applications  Denoising
Field of Experts: Denoising
Use gradient ascent
Find x which maximize probability = minimize minus log-likelihood
LL( x | y)  log p y | x  log px
Gradient descent of minus log-likelihood
  x LL x | y  
1

2
 x  y  x    x log p x 
2
Natural Images  FoE Applications  Denoising
Field of Experts: Denoising
Use gradient ascent
S. Zhu and D. Mumford. Prior
learning and Gibbs reactiondiffusion.
PAMI, 19(11):1236–1250, 1997.
Natural Images  FoE Applications  Denoising
Field of Experts: Denoising
Use gradient ascent
= Convolution
Natural Images  FoE Applications  Denoising
Field of Experts: Denoising
Use gradient ascent
= Convolution
Natural Images  FoE Applications  Denoising
Field of Experts: Denoising
Use gradient ascent
= Convolution
Natural Images  FoE Applications  Denoising
Field of Experts: Denoising
Use gradient ascent
= Convolution
Natural Images  FoE Applications  Denoising
Field of Experts: Denoising
Use gradient ascent
x
0 
y
Updating rate
<0.02: stable, slow computation
>0.02: unstable, fast computation
Optional Weight
Experimental better results
Selected from a few candidates
Many iteration with >0.02
250 iteration with =0.02, “cleaning up”
Natural Images  FoE Applications  Denoising
Field of Experts: Denoising
Natural Images  FoE Applications  Denoising
Field of Experts: Denoising
Comparison
Original Image
Noisy Image: σ=25
Natural Images  FoE Applications  Denoising
Field of Experts: Denoising
Comparison
Field of Experts
Wavelet approach
Non-linear diffusion
PSNR=28.72dB
PSNR=28.90dB
PSNR=27.18dB
J. Portilla, V. Strela, M.
Wainwright, and E. Simoncelli.
IEEE Trans. Image Proc.,
12(11):1338–1351, 2003
J.Weickert. Scale-Space Theory
in Computer Vision, pp. 3–28,
1997.
Natural Images  FoE Applications  Denoising
Advantages of FoE
• Compared to non-Linear Diffusion:
– Uses many more filters
– Obtained filters in a principled way
• Compared to wavelets:
– Some results are even better
– Prior trained on different data
– Increased database can improve results
Natural Images  FoE Applications  Inpainting
Field of Experts: Inpainting
Natural Images  FoE Applications  Inpainting
Field of Experts: Inpainting
• Given image y, find true image x
• A painting mask is provided
y
painting mask
Natural Images  FoE Applications  Inpainting
Inpainting
Diffusion Techniques
M. Bertalmıo et al.
Image inpainting. ACM SIGGRAPH,
pp. 417–424, 2000
Natural Images  FoE Applications  Inpainting
Field of Experts: Inpainting
p(y) 
p(x)
0
M 
1
outsided mask
inside mask
Natural Images  FoE Applications  Inpainting
Field of Experts:Inpainting
M. Bertalmio et al.
Image inpainting. ACM SIGGRAPH,
pp. 417–424, 2000
Optical Flow
Back to Optical Flow
http://www.cs.brown.edu/people/black/images.html
v
u
Optical Flow  Previous Work
Previous Work
Finding basis optical flows
A discontinuity is a sum of weighted basis flow
Principle Component Analysis
D. J. Fleet, M. J. Black, Y. Yacoob, and A.
D. Jepson. Design and use of linear
models for image motion analysis. IJCV,
36(3):171–193, 2000.
Optical Flow  FoE
Optical Flow and FoE
Prior
Natural image database
Optical Flow database
?
Optical Flow  FoE  Database
Optical Flow and Field of Experts
• Required statistics: for good experts
• Required database: for training
Database
Optical Flow  FoE  Database
Optical Flow Spatial Statistics
1) scene depth
2) camera motion
3) the independent motion of objects
Optical Flow  FoE  Database
Optical Flow Spatial Statistics
scene depth
Brown range image database
http://www.dam.brown.edu/ptg/brid/index.html
Optical Flow  FoE  Database
Optical Flow Spatial Statistics
scene depth
Brown range image database
http://www.dam.brown.edu/ptg/brid/index.html
Optical Flow  FoE  Database
Optical Flow Spatial Statistics
camera motion
• Hand-held or Car-mounted camera
• Walking, moving around object
• Analysis of camera motion:
boujou software system, http://www.2d3.com
Optical Flow  FoE  Database
Optical Flow Database generation
The optical flow is simply given by the difference in image
coordinates under which a scene point is viewed in each of
the two cameras.
Optical Flow  FoE  Learning
Optical Flow FoE Learning
Database:
• 100 video clips (~100 frames each) to
determine camera movement
• 197 indoor and outdoor depth scenes from
Brown range DB
• Generated a DB of 400 optical flow fields
(360x256 pixels)
Optical Flow  FoE  Database  Statistics
Optical Flow Velocity Statistics
Log histograms
horizontal velocity u,
vertical velocity v,
v
r
u
velocity r,
orientation θ.

Optical Flow  FoE  Database  Statistics
Optical Flow Derivative Statistics
Log histograms
• Have concentrated,
heavy tailed
distributions.
• Model with Student-t
distribution
 t 
 (t ;  )  1  
2

2

Same as Natural Images
∂u/∂x
∂v/∂x
∂u/∂y
∂v/∂y
Optical Flow  FoE  Learning
Learning Optical Flow
• MRF of 3x3 or 5x5 cliques
– Larger neighborhood than previous works
3x3
5x5
Optical Flow  FoE  Learning
Learning Optical Flow
• Use FoE to learn optical flow
• Use two models: horizontal and vertical
horizontal
vertical
???
Optical Flow  FoE  Learning
Learning Optical Flow
• Learn the experts from training data
– Contrastive Divergence
– Markov Chain Monte Carlo
Optical Flow  FoE  Evaluation
Optical Flow Evaluation
Combined Local Global (CLG) energy function (only 2D)


ECLG ( w) :   D ( wT J  ( I ) w )   S ( | w |2 ) dxdy

Data term
Spatial term
First Order
Higher order
constant
Optical Flow  FoE  Evaluation
Optical Flow Evaluation
Energy minimization


A wguess w  b
Look for local minimum:
Discretize:
The constraint has the form:


A w guess w(1)  b
 
A w(1) w( 2)  b
Solve linear equations using standard techniques, GMRES (Generalized Minimal Residual ).
Optical Flow  FoE  Examples
Optical Flow Examples: Yosemite
Database:
– Train the FoE prior on the ground truth data for the
Yosemite sequence, omitting frames 8 and 9
Evaluation:
– Frame 8 and 9
Experts:
– Use 3x3 patches and 8 filters
Optical Flow  FoE  Examples
Optical Flow Examples: Yosemite
v
u
Optical Flow  FoE  Examples
Comparison: Yosemite
Method
Quadratic +
Quadratic
AAE (average angle error)
2.93
Charbonnier + Charbonnier 1.70
Lorentzian + Charbonnier 1.76
Lorentzian +
D
FoE
1.32
S
=
FoE trained on synthetic database: AAE 1.82
 S Lorentzian ???
Experts
Optical Flow  FoE  Examples
Optical Flow Examples: Flower Garden
v
u
Remarks:
• Initial results of a promising technique:
– Generalization to U\V
– Improved optical flow database
– Include 3D data term
– 5x5 cliques can give better results (?)
Summary
• Field of Experts is a combination of MRF and
PoE
• Field of Experts can learn spatial dependence of
optical flow sequences
• In contrast to other methods, the FoE prior does
not require any tuning of parameters besides 
• Combining FoE with CLG gives best results
• Given more general training data, generalization
can be improved
3x3
5x5
Special thanks to:
Denis and Darya
Oren Boiman.