Query Image Specific Adaptive Fusion of Local and Global

Download Report

Transcript Query Image Specific Adaptive Fusion of Local and Global

Query Specific Fusion
for Image Retrieval
Shaoting Zhang, Ming Yang
NEC Laboratories, America
Outline
• Overview of image retrieval/search
– Basic paradigm
– Local features indexed by vocabulary trees
– Global features indexed by compact hash codes
• Query specific fusion
– Graph construction
– Graph fusion
– Graph-based ranking
• Experiments
Content-based Image retrieval/search
• Scalability !!!
– Computational efficiency
– Memory consumption
– Retrieval accuracy
Online
feature extraction
Query
image
Offline
hashing
Images
Hashing
codes
Features
feature extraction
search
hashing
Features
Rank list
indexing
Hashing
codes
Inverted indexing
re-rank
Data
base
Local Features
Indexed by Vocabulary Trees
• Features: SIFT features
• Hashing: visual word IDs by
hierarchical K-means.
• Indexing: vocabulary trees
• Search: voting and sorting
• An example:
• ~1K SIFT features per image
• 10^6~1M leaf nodes in the tree
• Query time: ~100-200ms for 1M
images in the database
Scalable Recognition with a Vocabulary Tree
D. Nister and H. Stewenius, CVPR’06
Global Features
Indexed by Compact Hash Codes
•
•
•
•
Feature: GIST, RGB or HSV histograms, etc.
Hashing: compact binary codes, e.g., PCA+rotation+binarization.
Indexing: a flat storage with/out inverted indexes
Search: exhaustive search with Hamming distances + re-ranking
• An example:
• GIST -> PCA -> Binarization
• 960 floats -> 256 floats -> 256
bits (217 times smaller).
• Query time: 50-100ms, search
1M images using Hamming dist
Modeling the Shape of the Scene: A Holistic Representation, A. Oliva and A. Torralba, IJCV’01
Small Codes and Large Image Databases for Recognition, A. Torralba, R. Fergus, Y. Weiss, CVPR’08
Iterative Quantization: A Procrustean Approach to Learning Binary Codes, Y. Gong and S. Lazebnik, CVPR’11
Motivation
• Pros and Cons
Local feat.
Retrieval Memory
speed
usage
Retrieval
precision
applications
Image properties
to attend to
fast
high
high
near duplicate
local patterns
low
low
general images global statistics
Global feat. faster
• Can we combine or fuse these two approaches?
• Improve the retrieval precision
• No sacrifice of the efficiency Web image search
• Early fusion (feature level)?
• Late fusion (rank list level)? Hashing
global
features
Near
duplicate
image
retrieval
Vocabulary trees
of local features
Challenges
• The features and algorithms are dramatically different.
– Hard for the feature-level fusion
– Hard for the rank aggregation
• The fusion is query specific and database dependent
– Hard to learn how to combine cross different datasets
• No supervision and relevance feedback!
– Hard to evaluate the retrieval quality online
Query Specific Fusion
• How to evaluate online the quality of retrieve results
from methods using local or global features?
• Assumption: The consensus degree among top
candidate images reveal the retrieval quality
– The consistency of top candidates’ nearest neighborhoods.
• A graph-based approach to fusing and re-ranking
retrieval results of different methods.
Graph Construction
• Construct a weighted undirected graph to represent
a set of retrieval results of a query image q.
• Given the query q, image database D, a similarity
function S(.,.), top-k neighborhood
.
• Edge: the reciprocal neighbor relation
• Edge weight: the Jaccard similarity between
neighborhoods
Graph Fusion
• Fuse multiple graphs
to one graph
– Union of the nodes/edges and sum of the weights
Graph-based Ranking
• Ranking by a local Page Rank
– Perform a link analysis on G
– Rank the nodes by their connectivity in G
• Ranking by maximizing weighted density
Experiments
• Datasets: 4 public benchmark datasets
–
–
–
–
UKBench : 2,550*4=10200 images (k=5)
Corel-5K : 50*100 = 5000 images (k=15)
Holidays : 1491 images in 500 groups (k=5)
SFLandmark : 1.06M PCI and 638K PFI images (k=30)
• Baseline methods
–
–
–
–
Local features: VOC (contextual weighting, ICCV11)
Global features: GIST (960D=>256bits), HSV (2000D=>256bits)
Rank aggregation
A fusion method based on an SVM classifier
• Nearest neighbors are stored offline for the database
UKBench
• Evaluation: 4 x recall
at the first four
returned images,
referred as N-S score
(maximum = 4).
Corel-5K
• Corel 5K: 50 categories, each category has 100 images.
Average top-1 precision for leave-one-out retrievals.
Holidays
• Evaluation: mAP (%) for 1491 queries.
San Francisco Landmark
• Database images:
– Perspective central images (PCIs): 1.07M
– Perspective frontal images (PFIs): 638K.
• Query images: 803 image taken with a smart phone
• Evaluation: The recall rate in terms of buildings
San Francisco Landmark
• The fusion is applied to the top-50 candidates given by VOC.
Computation and Memory Cost
• The average query time
• Memory cost
– 340MB extra storage for the top-50 nearest
neighbor for 1.7M images in the SFLandmark.
Sample Query Results (1)
• In the UKbench
Sample Query Results (2)
• In the Corel-5K
Sample Query Results (3)
• In the SFlankmark
Conclusions
• A graph-based query specific fusion of retrieval sets
based on local and global features
–
–
–
–
Requires no supervision
Retains the efficiency of both methods
Improves the retrieval precision consistently on 4 datasets
Easy to be reproduced by other motivated researchers
• Limitations
– No reciprocal neighbor for certain queries in either methods
– Dynamical insertion or removal of database images