NRL Presentation - Laboratory for Intelligent Imaging and Neural
Download
Report
Transcript NRL Presentation - Laboratory for Intelligent Imaging and Neural
MURI Projects
Cortical mechanisms of integration and inference
Biomimetic models uniting cortical architectures with
graphical/statistical models
Application of Bayesian Hypercolumn model to target recognition in
HSI
Image statistics of hyperspectral and spatiotemporal imagery
Transition models to MR spectroscopy for biomedical applications
University of
Pennsylvania
Columbia
University
MIT
Leif Finkel
Paul Sajda
Ted Adelson
Kwabena Boahen
Josh Ze’evi
Yair Weiss
Diego Contreras
The Fundamental Problem: Integration
Integration of multiple cues
(contour, surface, texture,
Bottom-up and top-down integration
color, motion depth…)
horizontal integration
The Fundamental Problem: Integration
spatiotemporal
spatiospectral
Integration of multiple cues
(contour, surface, texture,
color, motion depth…)
horizontal integration
Probabilistic models offer a unifying approach to integration
How the Brain Does It?
The Cortical Hypercolumn
Fundamental module for processing a localized “patch”
(~2o visual angle) of the visual field
Contains neural machinery needed to construct statistically
description (i.e. multivariant PDF constructed across orientation,
scale, wavelength, disparity, velocity, etc.)
The Generalized Aperture Problem:
Capturing Non-local Dependencies
(i.e. Context)
h7
h5
h3
h1
h8
h6
h4
h2
Possible mechanism for capturing non-local
dependency structure: long-range corticocortical connections
(Bosking et. al. 1997)
(Geisler
Statistical properties
of natural images are
consistent with such a
mechanism
et. al,. 2001)
Approach: Bayesian Hypercolumn
Network
T11
T22
T33
T21
h1
T32
h2
h3
T12
P(*|h1)
T23
P(*|h2)
P(*|h3)
space
rate
Bayesian Hypercolumn as a Canonical
Unit in Biological Visual Processing
P(logan airport)=1
FC
P(DC10, 747, F -15,É)
IT
P(dof,ab,dp,cl,c1,c2,..)
V2-V3-V4
P(,,,d)
V1
Bridging Bayesian Networks and Cortical
Processing
A Hypercolumn Architecture for Computing
Target Salience
A Bayesian Network Model for Capturing
Contextual Cues: Applications to Target
Classification, Synthesis and Compression
Sha’ashua & Ullman, 1988
Orientation Pinwheels in Visual Cortex
Shmuel and Grinvald (2000)
Anatomical Connectivity in Striate Cortex
V H
Bosking, et al., (1997) J. Neurosci
Co-circularity
Physiology/Psychophysics
Sigman, M et al., PNAS 98 (2001)
Natural Image Statistics
Geisler, WS et al., Vis. Res. 41 (2001)
Contour Salience
R. Hess & D. Field (1999) Trends in Cog. Sci.
o
x
Intracellular In Vivo Physiological Recordings
D. Contreras & L. Palmer, unpublished data
A Hypercolumn-Based Model for Estimating Co-Circularity
D(
D( = D(
D(
Detect match between local & distant hypercolumn
D(
D(
Hypercolumn receives “matched” inputs
from multiple other hypercolumns
Multiple Matches Causes Transition to “Chattering” Behavior
D. McCormick
Synchronization of Chattering Bursts
Detects Clique of Connected Hypercolumns
Same Chattering Frequency
Synchronizes
Different Frequencies
Don’t Synchronize
Sha’ashua & Ullman, 1988
Hypercolumn-Based Co-circularity Measure
Bridging Bayesian Networks and Cortical
Processing
A Hypercolumn Architecture for Computing
Target Salience
A Bayesian Network Model for Capturing
Contextual Cues: Applications to Target
Classification, Synthesis and Compression
Problem: Integrating Multi-scale Features for
Object Recognition/Detection
Detecting small objects
having few features
Discriminating large objects
having subtle differences
Aim is to do this within a machine learning framework
Analogous Problems in Medical Imaging
Anatomical and
Physiological Context
breast cancers tend to
be highly vascularized
Context provided by
multiple modalities
leakage seen in fluorescein image can
provide insight into clinical significance of
drusen in fundus photo
Generative Probability Models
Statistical Pattern Recognizers are important
components of Automatic Target Recognition
(ATR) and Computer-aided Detection (CAD)
Systems.
Most are trained as discriminative models:
they model Pr(C | I) C=class, I=image.
However there are advantages to generative
models:
they model Pr(I | C) or Pr(I) .
By applying Bayes rule generative models can be
used for classification:
Pr(C|I)=Pr(I|C)P(C)/Pr(I)
discriminative
oooo o
x
ooo o
x
x xx o o o
o
x
x xx
xx x o o o
x
x x
x
generative
generative
Utility of a Generative Model
novelty detection – compute absolute value of Pr(I|C) to detect
images very different from those used to construct the model.
confidence measure on the output of the ATR/CAD system
synthesis – by sampling Pr(I|C) we can generate new images for
class C.
insight into the image structure captured by the model
compression– knowing Pr(I|C) gives the optimal code for
compressing the image.
object optimized compression
also noise suppression, segmentation etc.
The Hierarchical Image Probability (HIP)
Model
•
Coarse-to-fine conditional dependence.
•
Short-range (relative to pyramid level) dependencies
captured by modeling the distribution of feature vectors.
•
Longer-range dependencies captured through a set of hidden
variables.
•
Factor probabilities over position to make the model
tractable.
Coarse-to-fine Conditional Dependence
Pyramid divides image structure into scales.
Finer scales are conditioned on coarser scale (i.e.
objects contain parts, which contain sub-parts, etc.)
Pr( I ) Pr( I 0 | I 1) Pr( I1 | I 2 )
~
Define : G l ( I l 1 , G l )
~
~
and the map Γ l : I l G l
Factoring Across Scale
~
For any Pr( I ), if l is invertible for l {0, , L 1} then
~
Pr( I ) l Pr(G l | I l 1 ) Pr( I L )
l 0
L 1
~
~
~
Proof : l : I l G l is a change of variables with Jacobian l .
~
~
Pr( I 0 ) 0 Pr(G 0 ).
~
~
Since G 0 (G 0 , I1 ), Pr( I 0 ) 0 Pr(G 0 | I1 ) Pr( I1 ).
Repeat for I1 ,, I L 1
Models of Pr(Gl|Il+1)
Factor over position to make the computations tractability.
Need hidden variables (A) to capture non-local dependencies.
Assume Fl+1 and A carry relevant information of Il+1.
L 1
Pr( I ) Pr( g l | f l 1 , x, A) Pr( A | I L ) Pr( I L )
A
l 0 x I
l 1
where A and its dependencies are arbitrary.
Capturing Long-range Dependencies
(Context) with Hidden Variables
OA
Il+1
Il
If a large area of I l +1 implies
object class A, and class A
implies a certain texture in Il ,
local structure in Il depends on
non-local information in Il+1.
OA
OB
or
not
If Il+1 implies an object class which in turn
implies a texture over the region of the object,
but Il+1 contains no information for
differentiating object class A or B, distant
patches are mutually dependent
Coarse-to-fine conditioning alone does not make dependencies local
Tree Structure of Hidden Variables
Choose tree-structure for hidden variables/label
Belief network or HMM on a tree
Al+2
Al+1
Al
1-D
Hidden labels can be thought of as a
“learned segmentation” for the image
2-D
One Model for Pr(Gl|Il+1)
We choose a local integer “label” al at each x in Gl, with coarse-tofine conditioning of al on al+1 to make the dependency tree.
L1
Pr( I ) Pr( g l | fl 1 , al , x ) Pr( al | al 1 , x ) Pr( AL1 | I L ) Pr( I L )
A0 ,, AL 1 l 0 xI l 1
Pr( g l | fl 1 , al , x ) is a normal distributi on with mean gl ,al M al fl 1 and
covariance al ; can choose M al and al diagonal to simplify.
Train the model using Expectation/Maximization (EM) algorithm.
Structure of the HIP Model
Level l+2
f
Level l+1
f
g
Level l
g
a
analog to a long range
analog to a hypercolumn
cortico-cortical
connections
a
f
a
g
Example: X-Ray Mammography
…dataset and training…
Regions of Interest (ROIs) provided by Dr. Maryellen Giger from
the University of Chicago (UofC). ROIs represent outputs from
UofC CAD system for mass detection. 72 positive and 96
negative ROIs. Half of data used for training, half for testing.
Train two HIP models: masses (positives), non-masses (negatives).
Choose architecture using minimum description length (MDL)
criterion.
Bounded number of labels above at 17.
Best architecture; 17,17,11,2,1 hidden labels in levels 0-4
respectively.
Mass Detection
Reducing false positives by 25% without loss in sensitivity
Novelty Detection
pos
pos
neg
neg
Use novelty detection to establish a confidence measure for the detector
Image Synthesis
ROI image synthesized from positive model
ROI images synthesized from negative model
Synthesized images can be used to develop intuition of
how well model represents the data
Compression
original
JPEG
HIP
Results on Aerial Imagery
example images
classification
synthesis
Az(HIP)=0.87
vs.
Az(HPNN)=0.86
%correct(HIP)=85%
vs.
%correct(D/V)=78%
label 1
label 2
Hidden Variable Probabilities
compression