Gaussian Mixture Model

Download Report

Transcript Gaussian Mixture Model

Gist of the Scene
The Essence
Trayambaka
KT Karra
and
Garold Fuks
The “Gist” of a scene
If this is a street this must be a pedestrian
Physiological Evidence
• People are excellent in identifying pictures
(Standing L., QL. Exp. Psychol. 1973)
• Change Blindness (seconds)
(Simons DJ,Levin DT,Trends Cog.Sci. 97)
Gist: abstract meaning of scene
• Obtained
within 150 ms (Biederman, 1981, Thorpe S. et.al 1996 )
• Obtained without attention (Oliva & Schyns, 1997, Wolfe,J.M. 1998)
• Possibly derived via statistics of low-level structures
(e.g. Swain & Ballard, 1991)
What is the “gist”
• Inventory of the objects
(2-3 objects in 150 msec Luck & Vogel, Nature 390, 1997 )
• Relation between objects (layout)
(J. Wolfe, Curr. Bio. 1998, 8 )
• Presence of other objects
• “Visual stuff” – impression of low level
features
How does the “Gist” works
Statistical
Properties
Object
Properties
R.A. Rensink, lecture notes
Outline
• Context Modeling
– Previous Models
– Scene based Context Model
• Context Based Applications
–
–
–
–
–
Place Identification
Object Priming
Control of Focus of Attention
Scale Selection
Scene Classification
• Joint Local and Global Features Applications
– Object Detection and Localization
• Summary
Probabilistic Framework
MAP Estimator
P (v / O )
P(O / v) 
P(O)
P (v )
v – image measurements
O – object property
Category (o)
Location (x)
Scale (σ)
Object-Centered Object Detection
• The only image features relevant to object detection
are those belonging to the object and not the
background
P (v L / O )
P(O / v)  P(O / v L ) 
P(O)
P (v L )
B. Moghaddam,
A. Petland
IEEE, PAMI-19
1997
The “Gist” of a scene
Local features can be ambiguous
Context can provide prior
Scene Based Context Model
Background provides a likelihood of finding an object
P (v L / O, v C )
P(O / v)  P(O / v L , v C ) 
P(O / v C )
P (v L / v C )
Prob(Car/image) = low
Prob(Person/image) = high
Context Modeling
 Previous Context Models
(Fu, Hammond and Swain, 1994,Haralick, 1983; Song et al, 2000)
 Rule Based Context Model
 Object Based Context Model
 Scene centered context representation
(Oliva and Torralba, 2001,2002)
Rule Based Context Model
Structural Description
O1
Above
O2
O2
Touch
O3
Right-of
O4
Above
Left-of
O4
Rule Based Context Model
Fu,
Hammond
and Swain,
1994
Object Based Context Model
• Context is incorporated only through prior
probability of object combinations in the world
 N

P(O1 ,..., ON , v1 ,..., v N )   P(v i / Oi )  P(O1 ,..., ON )
 i 1

R. Harralick,
IEEE, PAMI-5
1983
Scene Based Context Model
What are the features representing scene - ?
• Statistics of local low level features
• Color histograms
• Oriented band pass filters
Context Features - Vc
v ( x, k ) 
g1(x)
v(x,1)
g2(x)
v(x,2)
gK(x)
v(x,K)
 I ( x' ) g
x'
k
( x  x' )
Context Features - Vc
Gabor filter
g k ( x)  g 0  e
People, no car

x
2
 k2
 e 2i  f k , x 
Car , no people
Context Features - Vc
PCA
v ( x, k ) 
D
a
n 1
 1 ( x, k )
 2 ( x, k )
n

n
( x, k )
 3 ( x, k )
Context Features - Summary
I(x)
Bank
Of
Filters
v( x, kkK1
Dimension
Reduction
PCA
Context features  v C  an n 1,..., D
an   v( x, k )  n ( x, k )
x
k
Probability from Features
P (v L / O, v C )
P(O / v)  P(O / v L , v C ) 
P(O / v C )
P (v L / v C )
How to obtain context based probability priors P(O/vc)
on object properties - ?
• GMM - Gaussian Mixture Model
• Logistic regression
• Parzen window
Probability from Features
GMM
P(Object Property/Context)  P(O / v C ) 
P (v C / O )
 P(O)
P (v C )
P(v C )  P(v C / O)  P(O)  P(v C / O)  P(O)
Need to study two probabilities:
P(vc/O) – likelihood of the features given the presence of an object
P(vc/¬O) – likelihood of the features given the absence of an object
Gaussian Mixture Model:
M
Pv C / O    wi  G v C ; i ,  i 
i 1
The unknown parameters wi , i , i i 1
are learnt by EM algorithm
M
Probability from Features
How to obtain context based probability priors P(O/vc)
on object properties - ?
• GMM - Gaussian Mixture Model
• Logistic regression
• Parzen window
Probability from Features
Logistic Regression
 P(O / v C ) 
  F (v C )
Logit  log 
 P(O / v C ) 
D
F (v C )  a0   ai  v C (i )
i 1
1
pO / v C  
 F (vC )
1 e
Probability from Features
Logistic Regression
Example
O = having back problems
vc = age
 P(O / v C ) 
  a0  a1  age  20
log 
 P(O / v C ) 
Training Stage
a0
- The log odds for 20 year old person
a1
- The log odds ratio when comparing two persons
who differ 1 year in age
Working Stage
pO / age  
1
1  e  a0  a1 age 20
Probability from Features
How to obtain context based probability priors P(O/vc)
on object properties - ?
• GMM - Gaussian Mixture Model
• Logistic regression
• Parzen window
Probability from Features
Parzen Window
Pv C / O    K v C  v j 
j
Radial Gaussian Kernel
K v   k  e

v
2
2
What did we have so far…
• Context Modeling
• Context Based Applications
– Place Identification
– Object Priming
– Control of Focus of Attention
– Scale Selection
– Scene Classification
Place Identification
Goal: Recognize specific locations
P v C / Place j P Place j 
P Place j / v C  
 Pv / Place PPlace 
   K v  v 
C
P v C / Place j
j
j
C
j
j
j
Place Identification
A.Torralba, K.Murphy, W. Freeman, M. Rubin ICCV 2003
Place Identification
Decide only when
max PPlace j / vC   
j
Precision vs. Recall rate:
A.Torralba, P. Sinha, MIT AIM 2001-015
Object Priming
•
How do we detect objects in an image?
– Search the whole image for the object
model.
– What if I am searching in images where the
object doesn’t exist at all?
• Obviously, wasting “my precious”
computational resources. --------GOLUM.
•
Can we do better and if so, how?
– Use the “great eye”, the contextual features
of the image (vC), to predict the probability
of finding our object of interest, o in the
image i.e. P(o / vC).
Object Priming …..
• What to do?
P (v C / o ) P (o )
– Use my experience to learn P(o / v C ) 
P (v C )
from a database of images with
P(vC )  P(vC / o) P(o)  P(vC / o) P(o)
• How to do it?
M
– Learn the PDF Pv C / o    wi  G (v C ; v i ,V i ) , by a mixture of
i 1
Gaussians
M
– Also, learn the PDF Pv C / o    wi  G (v C ; v i ,V i )
i 1
Object Priming …..
Object Priming …..
Control of Focus of Attention
• How do biological visual systems use to deal with the analysis of
complex real-world scenes?
– by focusing attention into image regions that require detailed analysis.
Modeling the Control of Focus of Attention
How to decide which regions are “more” important than
others?
•
Local–type methods
1. Low level saliency maps – regions that have different
properties than their neighborhood are considered salient.
2. Object centered methods.
•
Global-type methods
1. Contextual control of focus of attention
Contextual Control of Focus of Attention
• Contextual control is both
– Task driven (looking for a particular object o) and
– Context driven (given global context information: vC)
• No use of object models (i.e. ignores
object centered features)
Contextual Control of Focus of Attention …
Contextual Control of Focus of Attention …
• Focus on spatial regions that have high probability of
containing the target object o given context information
(vC)
• For each location x, lets calculate the probability of
presence of the object o given the context vC.


• Evaluate the PDF P Location / Object , Context
based on the past experience of the system.
Contextual Control of Focus of Attention …
PLocation / Object , Contexti.e. P x / o, v C   ?
M
P  x / o, v C
w  G ( x; x , X )  G (v

  P  x, v / o  
Pv / o 
 w  G (v ; v , V
C
i 1
i
i
i
C
; v i ,V i )
i
)
M
C
i 1
i
C
i
Learning Stage: Use the Swiss Army Knife, the EM algorithm, to estimate
the parameters
wi , x i , X i v i ,V i iM1
Contextual Control of Focus of Attention …
Scale Selection
• Scale selection is
• a fundamental problem in computer vision.
• a key bottleneck for object-centered object
detection algorithms.
• Can we estimate scale in a pre-processing
stage?
• Yes, using saliency measures of low-level
operators across spatial scales.
• Other methods? Of course, …..
Context-Driven Scale Selection
PScale / Location, Object , Context  PScale / Object , Context
i.e. P / o, v C   ?
M
P  / o, v C  
wi  G ( ;  i , S i )  G (v C ; v i , V i )



P  , vC / o
 i 1
M
P v C / o 
w  G (v ; v , V )

i 1
i
C
i
i
M
Preferred Scale,     P / o, v C  d 
  w  G (v
i 1
M
i
i
 w  G (v
i 1
i
C
C
; v i ,V i )
; v i ,V i )
Context-Driven Scale Selection ….
Context-Driven Scale Selection ….
Scene Classification
• Strong correlation between the presence of many types of objects.
• Do not model this correlation directly. Rather, use a “common” cause, which
we shall call “scene”.
• Train a Classifier to identify scenes.
•Then all we need is to calculate
PScene  s / v C  
PClassifierScene  s / v C 
P
Classifier
s
Scene  s / vC 
What did we have so far…
• Context Modeling
• Context Based Applications
• Joint Local and Global Features Applications
– Object Detection and Localization
Need new tools: Learning and Boosting
Weak Learners
• Given (x1,y1),…,(xm,ym) where
xi  X  {set of emails}
yi  Y  {spam, non  spam}
• Can we extract “rules of thumb” for classification
purposes?
• Weak learner finds a weak hypothesis (rule of thumb)
h:X
{spam, non-spam}
Decision Stumps
• Consider the following simple family of component
classifiers generating ±1 labels:
h(x;p) = a[xk > t] - b
where p = {a, b, k, t}. These are called decision stumps.
• Sign (h) for classification and mag (h) for a confidence
measure.
• Each decision stump pays attention to only a single
component of the input vector.
Ponders his maker, ponders his will
• Can we combine weak classifiers to produce a single
strong classifier in a simple manner:
hm(x) = h(x;p1) + …. + h(x;pm)
where the predicted label for x is the sign of hm(x).
• Is it beneficial to allow some of the weak classifiers to
have more “votes” than others:
hm(x) = α1h(x;p1) + …. + αmh(x;pm)
where the non-negative votes αi can be used to
emphasize the components more reliable than others.
Boosting
What is boosting?
– A general method for improving the accuracy of any
given weak learning algorithm.
– Introduced in the framework of PAC learning model.
– But, works with any weak learner (in our case the
decision stumps).
Boosting …..
• A boosting algorithm sequentially estimates and
combines classifiers by re-weighting training examples
(each time concentrating on the harder examples)
– each component classifier is presented with a slightly
different problem depending on the weights
• Base Algorithms
– a set of “weak” binary (±1) classifiers h(x;p) such as
decision stumps
– normalized weights D1(i) on the training examples,
initially set to uniform (D1(i) = 1 / m)
AdaBoost
1.
At the tth iteration we find a weak classifier h(x;pt) for which the
classification error is better than chance.
m
 t  0.5  0.5( Dt (i ) yi h( xi ; pt ))
i 1
2.
The new component classifier is assigned “votes” based on its
performance
 t  0.5 log(( 1   t ) /  t )
3.
The weights on the training examples are updated according to
Dt (i ) exp(  t yi ht ( xi ))
Dt 1 (i ) 
Zt
where Zt is a normalization factor.
AdaBoost
Gambling
Gari
Uri
KT
Object Detection and Localization
• 3 Families of Approaches
– Parts based
• Object defined as spatial arrangement of small parts.
– Region based
• Use segmentation to extract a region of image from the background
and deduce shape and texture info from its local features.
– Patch based
• Use local features to classify each rectangular image region as
object or background.
• Object detection is reduced to a binary classification problem
i.e compute just P(OCi = 1 / vCi)
where OCi = 1 if patch i contains (part of) an object of class C
vCi = the feature vector for patch i computed for class C.
Feature Vector for a Patch: Step 1
Feature Vector for a Patch: Step 2
Feature Vector for a Patch: Step 3
Summary: Feature Vector Extraction
12 * 30 *2 = 720 features
Filters and Spatial Templates
Object Detection …..
• Do I need all the features for a
given object class?
• If so, what features should I
extract for a given object class?
– Use training to learn which features
are more important than others.
Classifier: Boosted Features
• What is available?
– Training data is v = the features of the patch containing an object o.
• Weak learners pay attention to single features:
– ht(v) picks best feature and threshold:
• Output is
ht (v)  v(k )   
 (v)    t ht (v)
t
– ht(v) = output of weak classifier at round t
– αt = weight assigned by boosting
• ~100 rounds of boosting
Examples of Learned Features
Example Detections
Using the Gist for Object Localization
• Use gist to predict the possible location of the object.
• Should I run my detectors only in that region?
– No! Misses detection if the object is at any other location.
– So, search everywhere but penalize those that are far from predicted
locations.
• But how?
Using the Gist for Object Localization ….
• Construct a feature vector
 

f  viC , xiC , x C

which combines the output of the boosted

classifier,  viC  and the difference xiC  x C .
xiC  location of the patch.

x C  predicted location for objects of this class .
• Train another classifier to compute

  

P OiC  1 / f  viC , xiC , x C

Using the Gist for Object Localization ….
Summary
• Context Modeling
– Previous Models
– Scene based Context Model
Summary
• Context Modeling
• Context Based Applications
–
–
–
–
–
Place Identification
Object Priming
Control of Focus of Attention
Scale Selection
Scene Classification
Summary
• Context Modeling
• Context Based Applications
• Joint Local and Global Features Applications
– Object Detection and Localization