f o = central frequency θ = angle γ = sigma in direction of

Download Report

Transcript f o = central frequency θ = angle γ = sigma in direction of

Biologically Motivated
Computer Vision
Digital Image Processing
Sumitha Balasuriya
Department of Computing Science, University of Glasgow
General Vision Problem
• Machine vision has been very successful in finding
solutions to specific, well constrained problems such as
optical character recognition or fingerprint recognition. In
fact machine vision has surpassed human vision in many
such closed domain tasks.
• However it is only in biology where we find systems that
can handle unconstrained, diverse vision problems.
• How can a biological or machine system which just
captures two dimensional visual information from a view
of a cluttered field even attempt to reason with and
function in the environment? An accurate detailed spatial
model of the environment is difficult to compute and the
whole problem of scene analysis is ill-posed.
A problem is well posed if (1) a solution exists, (2)
the solution is unique, (3) the solution depends
continuously on the initial data (stability property).
Ill-posed problem
?
Several possible solutions exist
The general vision problem isn’t
really solved in biology …
• For example I can't build an accurate spatial world model
of the scene I look at ...
• Biological systems have evolved to process visual data
to extract just enough information to perform the
reasoning for everyday tasks that are part of survival.
• Visual information is combined with higher level
knowledge and other sensory modalities that constrain
the reasoning in the solution space and finally makes
vision possible.
Visual cortex and a bit more …
Direct feedback projections to V1
originate from:
V2 (complex features)
V3 (orientation, motion, depth)
V4 (colour, attention)
MT (motion)
MST (motion)
FEF (saccades, spatial memory)
LIP (saccade planning)
IT (recognition)
Lower visual
cortex
Feedback from higher
cortical areas
Frontal cortex  V2, V4, FEF, IT  V1
Face

features

V1
Held and Hein, 1963
• Newborn kittens
• Placed in a carousel
• One active, other passively
towed along
• Both receive same stimulation
• The actively moving kitten
receives visual stimulation which
results from its own movements
• Only the active kitten develops
sensory-motor coordination.
Conventional Computer Vision
Architecture
Input
Feature
Extraction
Action
Classification,
Recognition, Disparity
Output
The Future - Biologically Motivated
Computer Vision Architecture
Feedback processing
Is there a square,
triangle or circle?
Task /
Goal
Hierarchical
processing
Square
triangle
s t
Other modalities
More abstract features / symbols
Optical illusions
Feedforward
processing
Lateral
processing
Input
Biologically Motivated Computer Vision
Architectures in action
Simple colour cues.
Foveated sensors.
Also:
Learnt arm control,
Learn how to act on
objects
http://www.lira.dist.unige.it/babybotvideos.htm
Biologically Inspired features
• Machine vision and biological vision systems
process similar information (visual scenes) and
perform similar tasks (recognition, targeting)
• Not surprisingly the optimal features that are
extracted by many machine vision system look
surprising like those found in biology
• But first ….
Why bother with feature extraction?
• Why not use the actual image/video itself for
reasoning/analysis?
INVARIANCE!
• The information we extract (i.e. the features) from
the ‘entity’ must be insensitive to changes.
• The extracted features might be invariant to rotation
and scaling of objects in images, lighting conditions,
partial occlusions
11
What features should we extract?
• Depends….
• Modality (video/image/audio …)
• Task (eg: topic categorisation/face recognition/
audio compression)
• Dimensionality reduction / sparsification
• Invariance vs descriptiveness
If they generalise to much –
everything looks just about
the same
If the features are too descriptive
they can’t generalise to new
examples
As the feature we extract becomes more
complex/descriptive it will also become less invariant to
even minor changes in the entity that we are measuring.
Human visual pathway
• Inspiration for feature extraction methodology
Receptive field: area in
the FOV in which
stimulation leads to a
response in the neuron
Circularly symmetric
retinal ganglion
receptive fields
Orientated simple cell
cortical receptive fields
(similar to Gabor filter)
Gabor filter
• A function f(t) can be decomposed into cosine
(even) and sine (odd) functions. Good for
defining periodic structures. Not localised.
• There is an uncertainty relation between a
signals specificity in time and frequency.
• Dennis Gabor defined a family of signals that
optimised this trade-off
• Enables us to extract local features
• Daugman(1995) defined a 2D filter based on
the above which was called a Gabor filter
• These filters resemble cortical simple cells
Gabor filter
• Localise the sine and cosine functions using a
Gabor envelope.

1
h( x, y) 
e
2
x2  y 2
2 2
e j 2 Ux Vy 
σ
Assuming symmetric Gaussian envelope
2 

H (u , v)  e
2
v
2

2
2
 u U    v V  
In the Fourier domain the Gabor is a Gaussian
centred about the central frequency (U,V). The
orientation of the Gabor in the spatial domain is
V 

U 
Gaussian envelope
Gaussian envelope
Modulating cosine
Modulating sine
U,V
  tan 1 
u
Even symmetric cosine
Gabor wavelet
Odd symmetric sine
Gabor wavelet
Spatial Frequency Bandwidth
Spatial
Spectral (Fourier)
• Bandwidth at half power point
0.2650
u1  u 2 

frequency
• Bandwidth depends on symmetric
Gaussian envelope’s sigma. Large
sigma results in narrow bandwidth
at the Gabor filter exactly filters at its central frequency. Also due
to the uncertainty relation a narrow frequency bandwidth will result
in reduced spatial localisation by the filter.
Spatial filter profile
Wide bandwidth
Narrow bandwidth
Even symmetric cosine
Gabor wavelet
Odd symmetric sine
Gabor wavelet
Gabor filter with asymmetric Gaussian
•
•
However the Gabor’s Gaussian envelope need not be
circular symmetric! An elliptical spatial Gaussian
envelope lets us control orientation bandwidth.
Better formulation for asymmetric Gaussian envelope
Spatial domain
 ( x, y) 
fo2

e
 f o2 2 f o2 2 
 2 x '  2 y ' 





e j 2 fo x '
x '  x cos   y sin  along direction of wave propagation
y '  -x sin   y cos 
fo= central frequency
θ = angle
Spectral domain
 (u, v)  e

2
f o2

2
u '  u cos  v sin 
v '  - u sin   v cos
 u '  fo  
2
2
2
v'

γ = sigma in direction of propagation
η = sigma perpendicular to direction of propagation
along direction of wave propagation
Fourier domain
Bandwidth of Gabor with asymmetric
Gaussian
Half power points
1
 e
2


2
f o2
2
 u '  f o 2  2 v '2

Along direction of wave propagation,
v'  0
u '  fo
 2 
1
 e fo
2
2
 u '
Perpendicular to direction of wave propagation,
2
 u '  f o 2
fo   
2
u '  fo 
fo


f o2
 
2
2
 1 
ln 

 2
 1 
 ln 

 2
Spatial bandwidth in direction of wave
propagation
2 fo

 1 
 ln 

 2
1
 e
2

2 2 2
 v'
f o2


f o2  1 
v '   2 2 ln 
   2 
2
v'  
fo
 1 
 ln 


 2
Spatial bandwidth perpendicular to
wave propagation
2 fo

 1 
 ln 

 2
Orientation Bandwidth
• Orientation bandwidth is related to the number of orientations we
want to extract. The half power points of the filters should
coincide in the spectral domain.
If the filter bank consists of k orientated filters, and redundancy in orientation sampling
l=rθ
v ' 

k
fo 

2k

2

k
2 fo

small θ
fo
v
Spatial frequency
bandwidth
 1 
 ln 

 2
 1 
 ln 

 2
Half
power
Orientation
bandwidth
Δθ
ωo
u
Orientation Bandwidth
Spatial domain
v
Half
power
Spatial frequenc
bandwidth
v
v
v
Orientation
bandwidth
Δθ
ωo
u
u
u
Frequency domain
Filter bank
u
Hypercolumn
• Experiments by Hubel and Weisel (1962,1968)
• A set of orientation selective units over a common
patch of the FOV.
• Organised as a vertical column in the visual cortex
• In computational system use information in
hypercolumn for higher level reasoning
Only using the even symmetric
component in the filter bank
Feature
vector
Properties of the hypercolumn
feature vector
• Invariance to rotation in image plane
stimulation
8
 R
i 1
8
2
,i
  R0,i 2
i 1
Hypercolumn responses
Even symmetric detector
Cycle to canonical orientation
• Invariance to rotation in image plane
stimulation
Cycle responses in feature vector
Properties of the hypercolumn
feature vector
• Invariance to scaling (i.e. spatial frequency)
stimulation
8
 R
i 1
central frequency
8
2
,i
  R0,i 2
i 1
Scale Invariance Feature Transform
• Pandemonium model (Selfridge, 1959!)
• Build ever more complex
/ abstract features along
the hierarchy
• Aggregate hypercolumn
feature vectors to
complex feature
SIFT features
Rotate hypercolumn
features to canonical of
large support region
Rotate descriptor
canonical of large
support region
Complex feature vector
Hypercolumn features
Recognition
• Extract SIFT features at corner locations (Harris corner
detector), and scale space peaks
Training
Recognition
Recap
• Biologically motivated computer vision architecture
• Feedforward, feedback, lateral processing in
architecture
• Hierarchical processing
• Feature extraction provides information about entities
which are (somewhat!) invariant to changes
• Gabor filter
• Hypercolumn feature vector.
• SIFT features
The End