Transcript Document

ECE271A – Fall 2007
Content based Image
Retrieval
(at SVCL)
Nikhil Rasiwasia, Nuno Vasconcelos
Statistical Visual Computing Laboratory
University of California, San Diego
SVCL
1
Why image retrieval?
• Help in finding you the images you want.
Source: http://www.bspcn.com/2007/11/02/25-photographs-taken-at-the-exact-right-time/
SVCL
2
But there is Google right?
Metadata based retrieval systems
• Metadata based
retrieval systems
– text, click-rates, etc.
– Google Images
– Clearly not sufficient
• what if computers
understood images?
– Content based image
retrieval (early 90’s)
– search based on
the image content
SVCL
Top 12 retrieval results for the query ‘Mountain’
3
Early understanding of images.
• Query by Visual Example(QBVE)
– user provides query image
– system extracts image features (texture, color, shape)
– returns nearest neighbors using suitable similarity
measure
Texture
similarity
Color
similarity
Shape
similarity
SVCL
4
This is a graduate class, so! Details
• Bag of features representation
– No spatial information ()
– Yet performs good ()
• Each Feature represented by DCT coefficients
– Other people use SIFT, Gabor filters etc
dct(
)
SVCL
5
Image representation
+
+ +
+ + + + +
++ + ++ ++
++
+
+ + ++ +
+ +
++ +++ ++++
+
+
+ ++++
++
+
++ + + + ++
+
+ + + ++ +
+
+
+ +++++ + + ++
+ +
+
+ + ++ + + ++
+
+
+ + + + + + + ++ +
++ + + + + + +
+
+
+
+
GMM
Bag of DCT vectors
+
+
+ + + +
++ + ++ ++++ ++ ++
+
++ +++ +++++ ++++++ ++ + +
+
++
+
+ + + + + + +++ + ++
+++ +
+
+
+
+
+
+ + + ++ + ++ + +
+
+ + +
+ ++++ + ++ ++ ++ + ++ +
++ ++ + +
+
+
+
+
SVCL
6
Query by visual example
Candidate Images
Ranking
p1 >
p2
Probability under
various models
Query Image
.
.
> pn
SVCL
7
Query by visual example (QBVE)
QUERY
TOP MATCHES
SVCL
8
What can go wrong?
QUERY
TOP MATCHES
SVCL
9
This can go wrong!
• visual similarity does not always correlate with
“semantic” similarity
Both have visually dissimilar sky
SVCL
Disagreement of the semantic notions of train
with the visual notions of arch.
10
Intelligent Researchers (like u)…
• Semantic Retrieval (SR)
abc
– User provided a query text (keywords)
– find images that contains the associated semantic concept.
query: “people, beach”
– around the year 2000,
– model semantic classes, learn to annotate images
– Provides higher level of abstraction, and supports natural
language queries
SVCL
11
Semantic Class Modeling
Bag of DCT vectors
+
++ ++ + +
++ +++ ++ ++
++++ + +
++ +++++++
++++++++ +
+
+
+
++ ++ +++ + +++ + ++
+
+ +++++++++ ++ ++
+ ++ + + +
+ +++++ ++ ++ ++++++ +
++ ++ + +
+ + + +
wi = mountain
GMM
+ ++
+ + + ++ +
++ + ++ ++
+++++ + ++ ++ +
+
+
+ ++ + ++++++ + +
++++++++
+ + ++
+ ++ ++ +
++++++++ +++
+
+ + +++++
++++++++ ++ +++ +++
+
++ + + +
+ + + +
mountain
Efficient
Hierarchical
Estimation
Semantic
Class Model
PX |W x | mountain
•“Formulating Semantics Image Annotation as a Supervised Learning Problem”
[G. Carneiro,
IEEE Trans. PAMI, 2007]
SVCL
12
Semantic Retrieval
Candidate Words
Mountian
Sky
Probability under
various models
Ranking
p1 >
p2
house
Query Image
Sexy
.
Girl
.
… so on
SVCL
> pn
13
SVCL
14
SVCL
15
First Five Ranked Results
• Query: mountain
• Query: pool
• Query: tiger
SVCL
16
First Five Ranked Results
• Query: horses
• Query: plants
• Query: blooms
SVCL
17
First Five Ranked Results
• Query: clouds
• Query: field
• Query: flowers
SVCL
18
First Five Ranked Results
• Query: jet
• Query: leaf
• Query: sea
SVCL
19
abc
But: Semantic Retrieval (SR)
• Problem of lexical ambiguity
– multiple meaning of the same word
• Anchor - TV anchor or for Ship?
• Bank
- Financial Institution or River
bank?
• Multiple semantic interpretations
of an image
Lake? Fishing? Boating?
People?
• Boating or Fishing or People?
• Limited by Vocabulary size
– What if the system was not trained
for ‘Fishing’
– In other words, it is outside the
space of trained semantic concepts
SVCL
Fishing! what if not in the
vocabulary?
20
abc
VS
In Summary
• SR Higher level of abstraction
– Better generalization inside the space of
trained semantic concepts
– But problem of
Lake? Fishing? Boating? People?
• Lexical ambiguity
• Multiple semantic interpretations
• Vocabulary size
• QBVE is unrestricted by language.
– Better Generalization outside the space of
trained semantic concepts
Fishing! what if not in the vocabulary?
• a query image of ‘Fishing’ would retrieve
visually similar images.
– But weakly correlated with human notion
of similarity
Both have visually dissimilar sky
The two systems in many respects are complementary!
SVCL
21
Query by Semantic Example (QBSE)
• Suggests an alternate query by example
paradigm.
vector of weights or
Semantic multinomial
.2 .3 .2 .1 …
Mapping to an
abstract space of
semantic concepts
…
…
Semantic Space
– The user provides an image.
– The image is mapped to vector of weights of all the
semantic concepts in the vocabulary, using a semantic
labeling system.
– Can be thought as an projection to an abstract space,
called as the semantic space
– To retrieve an image, this weight vector is matched to
database, using a suitable similarity function
SVCL
22
Query by Semantic Example (QBSE)
(SR) query: water, boating
=
• As an extension of SR
• Problem of lexical ambiguity- Bank+’more’
• Multiple semantic interpretation– Boating,
People
• Outside the ‘semantic space’ – Fishing.
• As an enrichment of QBVE
0 .5 0 .5 … …
…
0

– Query specification not as set of few words.
– But a vector of weights of all the semantic
concept in the vocabulary.
– Can eliminat
(QBVE) query: image
.1 .2 .1 .3 … … .2
…
– The query is still by an example paradigm.
– But feature space is Semantic.
Water
• A mapping of the image to an abstract space.
Semantic
Space
– Similarity measure at a higher level of
abstraction.
Boating
SVCL
Lake
23
QBSE System
Ranked Retrieval
Any Semantic
Labeling System
Query Image
Concept 1
Posterior
probability
Weight Vector
1
2
3
Concept 2
Concept 3
Weight Vector 2
Weight Vector 3
Weight Vector 4
Suitable
Similarity
Measure
Weight Vector 5
. . .
Concept L
Weight Vector 1
. . .
. . .
L
Database
Weight Vector N
SVCL
24
QBSE System
Ranked Retrieval
Any Semantic
Labeling System
Query Image
Concept 1
Posterior
probability
Weight Vector
1
2
3
Concept 2
Concept 3
Weight Vector 2
Weight Vector 3
Weight Vector 4
Suitable
Similarity
Measure
Weight Vector 5
. . .
Concept L
Weight Vector 1
. . .
. . .
L
Database
Weight Vector N
SVCL
25
Semantic Class Modeling
Bag of DCT vectors
+
++ ++ + +
++ +++ ++ ++
++++ + +
++ +++++++
++++++++ +
+
+
+
++ ++ +++ + +++ + ++
+
+ +++++++++ ++ ++
+ ++ + + +
+ +++++ ++ ++ ++++++ +
++ ++ + +
+ + + +
wi = mountain
Gaussian
Mixture
Model
+ ++
+ + + ++ +
++ + ++ ++
+++++ + ++ ++ +
+
+
+ ++ + ++++++ + +
++++++++
+ + ++
+ ++ ++ +
++++++++ +++
+
+ + +++++
++++++++ ++ +++ +++
+
++ + + +
+ + + +
mountain
Efficient
Hierarchical
Estimation
Semantic
Class Model
PX |W x | mountain
•“Formulating Semantics Image Annotation as a Supervised Learning Problem”
[G. Carneiro,
CVPR 2005]
SVCL
26
QBSE System
Ranked Retrieval
Any Semantic
Labeling System
Query Image
Concept 1
Posterior
probability
Weight Vector
1
2
3
Concept 2
Concept 3
Weight Vector 2
Weight Vector 3
Weight Vector 4
Suitable
Similarity
Measure
Weight Vector 5
. . .
Concept L
Weight Vector 1
. . .
. . .
L
Database
Weight Vector N
SVCL
27
Semantic Multinomial
• Posterior Probabilities under series of L independent
class models
PX |W x | sky
PX |W x | mountain
. . .
PX |W x | clouds
 1 
 
 2
 3 
 
 
 
 
 L 
L

i 1
i
1
SVCL
28
Semantic Multinomial
SVCL
29
QBSE System
Ranked Retrieval
Any Semantic
Labeling System
Query Image
Concept 1
Posterior
probability
Weight Vector
1
2
3
Concept 2
Concept 3
Weight Vector 2
Weight Vector 3
Weight Vector 4
Suitable
Similarity
Measure
Weight Vector 5
. . .
Concept L
Weight Vector 1
. . .
. . .
L
Database
Weight Vector N
SVCL
30
Query using QBSE
• Note that SMNs are probability
distributions
• A natural similarity function is
the Kullback-Leibler divergence
<>
f ()  argmaxi KL( || i )
<>
 ij
 arg maxi   j log
j
j 1
Database
L
<>
<>
Query
Query SMN
SVCL
31
VS
abc
Semantic Feature Space
sky
x
sky
… .4
…
.2
… …
…
x
…
0
.5 0
.5
0 … x
x x x
x xx
x
x
x
x
x
x
x
x
x xx
x
x “mountains
x x x x+ sky”
x xx x x
x xx
o -xquery
xx
x closest
x
o
x
x
x
match
x x
x
x
x
es
x
x x
x
x
x
0
vegetation
.2
o - query
x – database
images
mountain
x
xx
xx
x
x
x
x
x
x
x
x
o
x
x
x
x
x
xx
x
x
xx
xx
xx
x
x
xx
x
xx
x
xx
x
x
x
x
x
x – database
images
mountain
Closest matches
SVCL
xx
vegetation
• The space is the simplex of posterior concept probabilities
Query
by Semantic
• Each
image/SMN
is thus representedTraditional
as a point in Text
this
simplex Example
Based Query
32
Evaluation – Precision:Recall:Scope
Irrelevant Images |A|
|A|
|Ar| |Br|
Relevant Images |B|
|B|
Retrieved Images |Ar|+|Br|
Precision =
|Br|
|Ar| + |Br|
The proportion of
retrieved and
relevant images to
all the images
retrieved.
|Br|
Recall = |B|
The proportion of
relevant images
that are retrieved,
out of all relevant
images available.
SVCL
Scope = |Ar| + |Br|
The number of
images that are
retrieved.
34
Query
Relevant
#retrieved
Precision
Recall
Yes
1
1/1
1/3
No
2
1/2
1/3
1
0.8
Yes
3
2/3
2/3
Precision
Ranking
0.6
0.4
0.2
0
No
4
2/4
2/3
Yes
5
3/5
3/3
SVCL
0.33
0.66
1
Recall
35
Experimental Setup
• Evaluation Procedure
[Feng,04]
– Precision-Recall(scope) Curves : Calculate precision at
various recalls(scopes).
– Mean Average Precision: Average precision over all queries,
where recall changes (i.e. where relevant items occur)
• Training the Semantic Space
– Images – Corel Stock Photo CD’s – Corel50
• 5,000 images from 50 CD’s, 4,500 used for training the space
– Semantic Concepts
• Total of 371 concepts
• Each Image has caption of 1-5 concepts
• Semantic concept model learned for each concept.
SVCL
36
Experimental Setup
• Retrieval inside the Semantic Space.
– Images – Corel Stock Photo CD’s – same as Corel50
• 4,500 used as retrieval database
• 500 used to query the database
• Retrieval outside the Semantic Space
– Corel15 - Another 15 Corel Photo CD’s, (not used previously)
• 1200: retrieval database, 300: query database
– Flickr18 - 1800 images Downloaded from www.flickr.com
• 1440: retrieval database, 360: query database
• harder than Corel images as shot by non-professional flickr users
SVCL
37
VS
Inside the Semantic Space
• Precision of QBSE is significantly higher at most
levels of recall
SVCL
38
• MAP score for all the 50 classes
SVCL
VS
39
Inside the
Semantic
Space
same colors
different semantics
QBSE
QBVE
SVCL
40
Inside the
Semantic
Space “train +
“whitish +
darkish”
railroad”
QBSE
QBVE
SVCL
41
Outside the Semantic Space
SVCL
42
Commercial Construction
People
Buildings
Street
Statue
Tables
Water
Restaurant
QBSE
People
Restaurant
Sky
Tables
Street
Buildings
Statue
0.12
0.07
0.06
0.06
0.05
0.05
0.05
SVCL
0.09
0.07
0.07
0.05
0.04
0.04
0.04
VS
QBVE
People
Statue
Buildings
Tables
Street
Restaurant
House
0.08
0.07
0.06
0.05
0.05
0.04
0.03
Buildings
People
Street
Statue
Tree
Boats
Water
0.06
0.06
0.06
0.04
0.04
0.04
0.03
People
Statue
Buildings
Tables
Street
Door
Restaurant
0.1
0.08
0.07
0.06
0.06
0.05
0.04
43
QBSE vs QBVE
• nearest neighbors
in this space is
significantly
more robust
2

4


2
2

x
x x x
x xx x
x x
x xx
x
x
x xx
x xx x x x
x xx x x
x xx
xx x
x
x
x x
x
x x
x
x
x
x
x x
x
x
x
o
o - query
x – database
images
closest
matches
• both in terms of
– metrics
– subjective matching
quality
•“Query by semantic example”
[N. Rasiawasia, IEEE Trans. Multimedia 2007]
SVCL
44
Structure of the Semantic Space
• is the gain really due to the semantic structure of
the SMN space?
• this can be tested by comparing to a space where
the probabilities are relative to random image
groupings
wi = random imgs
wi
Efficient
Hierarchical
Estimation
Semantic
Class Model
PX |W x | wi 
SVCL
45
The semantic gain
• with random groupings performance is
– quite poor, indeed worse than QBVE
– there seems to be an intrinsic gain of relying on a space
where the features are semantic
SVCL
46
But what about this image ;)?
SVCL
49
Questions?
VS
SVCL
VS
abc
50
Flickr18
Corel15
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Automobiles
Building Landscapes
Facial Close Up
Flora
Flowers Close Up
Food and Fruits
Frozen
Hills and Valley
Horses and Foal
Jet Planes
Sand
Sculpture and Statues
Sea and Waves
Solar
Township
Train
Underwater
Water Fun
•
•
•
•
•
•
•
•
•
SVCL
Autumn
Adventure Sailing
Barnyard Animals
Caves
Cities of Italy
Commercial
Construction
Food
Greece
Helicopters
Military Vehicles
New Zealand
People of World
Residential Interiors
Sacred Places
Soldier
51
Content based image retrieval
• Query by Visual Example
(QBVE)
Query image
Visually Similar Image
– Color, Shape, Texture, Spatial
Layout.
– Image is represented as
multidimensional feature
vector
– Suitable similarity measure
• Semantic Retrieval (SR)
query: “people, beach”
abc
– Given keyword w, find
images that contains the
associated semantic
concept.
SVCL
52