Small Codes and Large Image Databases for

Download Report

Transcript Small Codes and Large Image Databases for

Small Codes and Large Image
Databases for Recognition
CVPR 2008
Antonio Torralba, MIT
Rob Fergus, NYU
Yair Weiss, Hebrew University
Outline
•
•
•
•
Introduction
Methods
Experiment
Conclusion
Outline
•
•
•
•
Introduction
Methods
Experiment
Conclusion
Summary
• Goal
– efficient image search(real time on web-sized)
and fast, just require little memory, enable on
standard hardware or handheld devices
• Approach
– Use machine learning to convert Gist descriptor
to a compact binary code with a few hundred bits
per image
Gist descriptor
• Global image representation
• Describe the shapes occurring in an image
with one descriptor
– Subdivide image in 4×4 sub images
– Calculate Gabor responses in each of these
– Create histograms of Gabor responses in each sub
image
Slide by James Hays and Alexei Efros
Gist descriptor
Slide by James Hays and Alexei Efros
Gist descriptor
• In this paper
– 8 orientations ,4 frequency = 4×8×16 = 512
dimensional vector.
– For smaller images (32×32 pixels), use 3 frequency
= 3×8×16 = 384 dimensions.
Binary Code
• Three reason
– compression, it’s possible to represent images
with a very small number of bits and still maintain
the information for recognition
Binary Code
– scaling up to web-size databases requires doing
the calculations in memory. Fitting hundreds of
millions of images into a few GB of memory
means we have a budget of very few bytes per
image.
– short binary codes allow very fast querying in
standard hardware, either using hash tables or
efficient bit-count operations
Locality Sensitive Hashing (LSH)
• high dimensional Euclidean space
– finds nearest neighbors in constant time
• a number of random projections of that point
into R1
– each projection contributes a few bits
• when the number of bits is fixed and small
– LSH can perform quite poorly
• In this paper
– N = 30 bits
Outline
•
•
•
•
Introduction
Methods
Experiment
Conclusion
Learning binary codes
•
•
•
•
•
A database of images {xi}
a distance function D(i, j)
a binary feature vector yi = f(xi)
Hamming distance
N100(xi) - the 100 nearest neighbors of xi
according to the distance function D(i, j)
• N100(yi) - the 100 descriptors yj that are closest
to yi in terms of Hamming distance
• we would like N100(xi) = N100(yi) for all examples
in our training set
BoostSSC
• Boosting similarity sensitive coding
• Learn original input space into a new space
– distances between images can be computed using
a weighted Hamming distance.
• Binary feature(M bits)
–
–
• weighted Hamming distance
–
BoostSSC
• positive examples
– pairs of images xi, xj , j ∈ N(xi).
• Negative examples
– pairs of images that are not neighbors
• regression stump
–
BoostSSC
• Minimize the square loss
–
– K is the number of training pairs
– Zk = 1, if the two images are neighbors; = −1,
otherwise
–
• In this paper
–
– M around 30 bits
Restricted Boltzmann Machines
• Network of binary stochastic units
•
– weights W, bias b
Hidden units: h
Symmetric weights: w
Visible units: v
Restricted Boltzmann Machines
• A probability can be assigned to a binary
vector at the visible units
–
• Convenient conditional distributions
–
–
Learn weights and biases using
Contrastive Divergence
Multi‐Layer RBM architecture
Training RBM models
• Pre‐training
– Unsupervised
– Use Contrastive Divergence to learn weights and
biases
– Gets parameters to right ballpark
• Fine‐tuning
–
–
–
–
Supervised
No longer stochastic
Backpropagate error to update parameters
Moves parameters to local minimum
Outline
•
•
•
•
Introduction
Methods
Experiment
Conclusion
Two test datasets
• LabelMe
– 22,000 images
– Ground truth segmentations for all
– Can define distance between images using these
segmentations
• Web data[28]
– 12.9 million images 32 × 32 colorimages
– Subset of 80 million images
– No labels, so use L2 distance between GIST vectors as
ground truth
[28] A. Torralba, R. Fergus, and W. T. Freeman. Tiny images. Technical Report MIT-CSAIL-TR-2007024, Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology,
2007.
LabelMe retrieval
LabelMe retrieval
• what ground truth semantic similarity is
– spatial pyramid matching over object labels
LabelMe retrieval
LabelMe retrieval
• On 2000 test images, N = 50
•
•
•
•
•
Web images retrieval
Web images retrieval
Retrieval speed evaluation
• Using multi-threading (M/T) on a quad-core
Pixel label
• On 2000 test images
Web images recognition
• On 2000 test images
Outline
•
•
•
•
Introduction
Methods
Experiment
Conclusion
Conclusion
• Possible to build compact codes for retrieval
– Fast and small on standard PC
– Suitable for use on large database
– Much room for improvement