Gaussian Mixture Model
Download
Report
Transcript Gaussian Mixture Model
Gist of the Scene
The Essence
Trayambaka
KT Karra
and
Garold Fuks
The “Gist” of a scene
If this is a street this must be a pedestrian
Physiological Evidence
• People are excellent in identifying pictures
(Standing L., QL. Exp. Psychol. 1973)
• Change Blindness (seconds)
(Simons DJ,Levin DT,Trends Cog.Sci. 97)
Gist: abstract meaning of scene
• Obtained
within 150 ms (Biederman, 1981, Thorpe S. et.al 1996 )
• Obtained without attention (Oliva & Schyns, 1997, Wolfe,J.M. 1998)
• Possibly derived via statistics of low-level structures
(e.g. Swain & Ballard, 1991)
What is the “gist”
• Inventory of the objects
(2-3 objects in 150 msec Luck & Vogel, Nature 390, 1997 )
• Relation between objects (layout)
(J. Wolfe, Curr. Bio. 1998, 8 )
• Presence of other objects
• “Visual stuff” – impression of low level
features
How does the “Gist” works
Statistical
Properties
Object
Properties
R.A. Rensink, lecture notes
Outline
• Context Modeling
– Previous Models
– Scene based Context Model
• Context Based Applications
–
–
–
–
–
Place Identification
Object Priming
Control of Focus of Attention
Scale Selection
Scene Classification
• Joint Local and Global Features Applications
– Object Detection and Localization
• Summary
Probabilistic Framework
MAP Estimator
P (v / O )
P(O / v)
P(O)
P (v )
v – image measurements
O – object property
Category (o)
Location (x)
Scale (σ)
Object-Centered Object Detection
• The only image features relevant to object detection
are those belonging to the object and not the
background
P (v L / O )
P(O / v) P(O / v L )
P(O)
P (v L )
B. Moghaddam,
A. Petland
IEEE, PAMI-19
1997
The “Gist” of a scene
Local features can be ambiguous
Context can provide prior
Scene Based Context Model
Background provides a likelihood of finding an object
P (v L / O, v C )
P(O / v) P(O / v L , v C )
P(O / v C )
P (v L / v C )
Prob(Car/image) = low
Prob(Person/image) = high
Context Modeling
Previous Context Models
(Fu, Hammond and Swain, 1994,Haralick, 1983; Song et al, 2000)
Rule Based Context Model
Object Based Context Model
Scene centered context representation
(Oliva and Torralba, 2001,2002)
Rule Based Context Model
Structural Description
O1
Above
O2
O2
Touch
O3
Right-of
O4
Above
Left-of
O4
Rule Based Context Model
Fu,
Hammond
and Swain,
1994
Object Based Context Model
• Context is incorporated only through prior
probability of object combinations in the world
N
P(O1 ,..., ON , v1 ,..., v N ) P(v i / Oi ) P(O1 ,..., ON )
i 1
R. Harralick,
IEEE, PAMI-5
1983
Scene Based Context Model
What are the features representing scene - ?
• Statistics of local low level features
• Color histograms
• Oriented band pass filters
Context Features - Vc
v ( x, k )
g1(x)
v(x,1)
g2(x)
v(x,2)
gK(x)
v(x,K)
I ( x' ) g
x'
k
( x x' )
Context Features - Vc
Gabor filter
g k ( x) g 0 e
People, no car
x
2
k2
e 2i f k , x
Car , no people
Context Features - Vc
PCA
v ( x, k )
D
a
n 1
1 ( x, k )
2 ( x, k )
n
n
( x, k )
3 ( x, k )
Context Features - Summary
I(x)
Bank
Of
Filters
v( x, kkK1
Dimension
Reduction
PCA
Context features v C an n 1,..., D
an v( x, k ) n ( x, k )
x
k
Probability from Features
P (v L / O, v C )
P(O / v) P(O / v L , v C )
P(O / v C )
P (v L / v C )
How to obtain context based probability priors P(O/vc)
on object properties - ?
• GMM - Gaussian Mixture Model
• Logistic regression
• Parzen window
Probability from Features
GMM
P(Object Property/Context) P(O / v C )
P (v C / O )
P(O)
P (v C )
P(v C ) P(v C / O) P(O) P(v C / O) P(O)
Need to study two probabilities:
P(vc/O) – likelihood of the features given the presence of an object
P(vc/¬O) – likelihood of the features given the absence of an object
Gaussian Mixture Model:
M
Pv C / O wi G v C ; i , i
i 1
The unknown parameters wi , i , i i 1
are learnt by EM algorithm
M
Probability from Features
How to obtain context based probability priors P(O/vc)
on object properties - ?
• GMM - Gaussian Mixture Model
• Logistic regression
• Parzen window
Probability from Features
Logistic Regression
P(O / v C )
F (v C )
Logit log
P(O / v C )
D
F (v C ) a0 ai v C (i )
i 1
1
pO / v C
F (vC )
1 e
Probability from Features
Logistic Regression
Example
O = having back problems
vc = age
P(O / v C )
a0 a1 age 20
log
P(O / v C )
Training Stage
a0
- The log odds for 20 year old person
a1
- The log odds ratio when comparing two persons
who differ 1 year in age
Working Stage
pO / age
1
1 e a0 a1 age 20
Probability from Features
How to obtain context based probability priors P(O/vc)
on object properties - ?
• GMM - Gaussian Mixture Model
• Logistic regression
• Parzen window
Probability from Features
Parzen Window
Pv C / O K v C v j
j
Radial Gaussian Kernel
K v k e
v
2
2
What did we have so far…
• Context Modeling
• Context Based Applications
– Place Identification
– Object Priming
– Control of Focus of Attention
– Scale Selection
– Scene Classification
Place Identification
Goal: Recognize specific locations
P v C / Place j P Place j
P Place j / v C
Pv / Place PPlace
K v v
C
P v C / Place j
j
j
C
j
j
j
Place Identification
A.Torralba, K.Murphy, W. Freeman, M. Rubin ICCV 2003
Place Identification
Decide only when
max PPlace j / vC
j
Precision vs. Recall rate:
A.Torralba, P. Sinha, MIT AIM 2001-015
Object Priming
•
How do we detect objects in an image?
– Search the whole image for the object
model.
– What if I am searching in images where the
object doesn’t exist at all?
• Obviously, wasting “my precious”
computational resources. --------GOLUM.
•
Can we do better and if so, how?
– Use the “great eye”, the contextual features
of the image (vC), to predict the probability
of finding our object of interest, o in the
image i.e. P(o / vC).
Object Priming …..
• What to do?
P (v C / o ) P (o )
– Use my experience to learn P(o / v C )
P (v C )
from a database of images with
P(vC ) P(vC / o) P(o) P(vC / o) P(o)
• How to do it?
M
– Learn the PDF Pv C / o wi G (v C ; v i ,V i ) , by a mixture of
i 1
Gaussians
M
– Also, learn the PDF Pv C / o wi G (v C ; v i ,V i )
i 1
Object Priming …..
Object Priming …..
Control of Focus of Attention
• How do biological visual systems use to deal with the analysis of
complex real-world scenes?
– by focusing attention into image regions that require detailed analysis.
Modeling the Control of Focus of Attention
How to decide which regions are “more” important than
others?
•
Local–type methods
1. Low level saliency maps – regions that have different
properties than their neighborhood are considered salient.
2. Object centered methods.
•
Global-type methods
1. Contextual control of focus of attention
Contextual Control of Focus of Attention
• Contextual control is both
– Task driven (looking for a particular object o) and
– Context driven (given global context information: vC)
• No use of object models (i.e. ignores
object centered features)
Contextual Control of Focus of Attention …
Contextual Control of Focus of Attention …
• Focus on spatial regions that have high probability of
containing the target object o given context information
(vC)
• For each location x, lets calculate the probability of
presence of the object o given the context vC.
• Evaluate the PDF P Location / Object , Context
based on the past experience of the system.
Contextual Control of Focus of Attention …
PLocation / Object , Contexti.e. P x / o, v C ?
M
P x / o, v C
w G ( x; x , X ) G (v
P x, v / o
Pv / o
w G (v ; v , V
C
i 1
i
i
i
C
; v i ,V i )
i
)
M
C
i 1
i
C
i
Learning Stage: Use the Swiss Army Knife, the EM algorithm, to estimate
the parameters
wi , x i , X i v i ,V i iM1
Contextual Control of Focus of Attention …
Scale Selection
• Scale selection is
• a fundamental problem in computer vision.
• a key bottleneck for object-centered object
detection algorithms.
• Can we estimate scale in a pre-processing
stage?
• Yes, using saliency measures of low-level
operators across spatial scales.
• Other methods? Of course, …..
Context-Driven Scale Selection
PScale / Location, Object , Context PScale / Object , Context
i.e. P / o, v C ?
M
P / o, v C
wi G ( ; i , S i ) G (v C ; v i , V i )
P , vC / o
i 1
M
P v C / o
w G (v ; v , V )
i 1
i
C
i
i
M
Preferred Scale, P / o, v C d
w G (v
i 1
M
i
i
w G (v
i 1
i
C
C
; v i ,V i )
; v i ,V i )
Context-Driven Scale Selection ….
Context-Driven Scale Selection ….
Scene Classification
• Strong correlation between the presence of many types of objects.
• Do not model this correlation directly. Rather, use a “common” cause, which
we shall call “scene”.
• Train a Classifier to identify scenes.
•Then all we need is to calculate
PScene s / v C
PClassifierScene s / v C
P
Classifier
s
Scene s / vC
What did we have so far…
• Context Modeling
• Context Based Applications
• Joint Local and Global Features Applications
– Object Detection and Localization
Need new tools: Learning and Boosting
Weak Learners
• Given (x1,y1),…,(xm,ym) where
xi X {set of emails}
yi Y {spam, non spam}
• Can we extract “rules of thumb” for classification
purposes?
• Weak learner finds a weak hypothesis (rule of thumb)
h:X
{spam, non-spam}
Decision Stumps
• Consider the following simple family of component
classifiers generating ±1 labels:
h(x;p) = a[xk > t] - b
where p = {a, b, k, t}. These are called decision stumps.
• Sign (h) for classification and mag (h) for a confidence
measure.
• Each decision stump pays attention to only a single
component of the input vector.
Ponders his maker, ponders his will
• Can we combine weak classifiers to produce a single
strong classifier in a simple manner:
hm(x) = h(x;p1) + …. + h(x;pm)
where the predicted label for x is the sign of hm(x).
• Is it beneficial to allow some of the weak classifiers to
have more “votes” than others:
hm(x) = α1h(x;p1) + …. + αmh(x;pm)
where the non-negative votes αi can be used to
emphasize the components more reliable than others.
Boosting
What is boosting?
– A general method for improving the accuracy of any
given weak learning algorithm.
– Introduced in the framework of PAC learning model.
– But, works with any weak learner (in our case the
decision stumps).
Boosting …..
• A boosting algorithm sequentially estimates and
combines classifiers by re-weighting training examples
(each time concentrating on the harder examples)
– each component classifier is presented with a slightly
different problem depending on the weights
• Base Algorithms
– a set of “weak” binary (±1) classifiers h(x;p) such as
decision stumps
– normalized weights D1(i) on the training examples,
initially set to uniform (D1(i) = 1 / m)
AdaBoost
1.
At the tth iteration we find a weak classifier h(x;pt) for which the
classification error is better than chance.
m
t 0.5 0.5( Dt (i ) yi h( xi ; pt ))
i 1
2.
The new component classifier is assigned “votes” based on its
performance
t 0.5 log(( 1 t ) / t )
3.
The weights on the training examples are updated according to
Dt (i ) exp( t yi ht ( xi ))
Dt 1 (i )
Zt
where Zt is a normalization factor.
AdaBoost
Gambling
Gari
Uri
KT
Object Detection and Localization
• 3 Families of Approaches
– Parts based
• Object defined as spatial arrangement of small parts.
– Region based
• Use segmentation to extract a region of image from the background
and deduce shape and texture info from its local features.
– Patch based
• Use local features to classify each rectangular image region as
object or background.
• Object detection is reduced to a binary classification problem
i.e compute just P(OCi = 1 / vCi)
where OCi = 1 if patch i contains (part of) an object of class C
vCi = the feature vector for patch i computed for class C.
Feature Vector for a Patch: Step 1
Feature Vector for a Patch: Step 2
Feature Vector for a Patch: Step 3
Summary: Feature Vector Extraction
12 * 30 *2 = 720 features
Filters and Spatial Templates
Object Detection …..
• Do I need all the features for a
given object class?
• If so, what features should I
extract for a given object class?
– Use training to learn which features
are more important than others.
Classifier: Boosted Features
• What is available?
– Training data is v = the features of the patch containing an object o.
• Weak learners pay attention to single features:
– ht(v) picks best feature and threshold:
• Output is
ht (v) v(k )
(v) t ht (v)
t
– ht(v) = output of weak classifier at round t
– αt = weight assigned by boosting
• ~100 rounds of boosting
Examples of Learned Features
Example Detections
Using the Gist for Object Localization
• Use gist to predict the possible location of the object.
• Should I run my detectors only in that region?
– No! Misses detection if the object is at any other location.
– So, search everywhere but penalize those that are far from predicted
locations.
• But how?
Using the Gist for Object Localization ….
• Construct a feature vector
f viC , xiC , x C
which combines the output of the boosted
classifier, viC and the difference xiC x C .
xiC location of the patch.
x C predicted location for objects of this class .
• Train another classifier to compute
P OiC 1 / f viC , xiC , x C
Using the Gist for Object Localization ….
Summary
• Context Modeling
– Previous Models
– Scene based Context Model
Summary
• Context Modeling
• Context Based Applications
–
–
–
–
–
Place Identification
Object Priming
Control of Focus of Attention
Scale Selection
Scene Classification
Summary
• Context Modeling
• Context Based Applications
• Joint Local and Global Features Applications
– Object Detection and Localization