Approximate Nearest Neighbors: Towards Removing the Curse

Download Report

Transcript Approximate Nearest Neighbors: Towards Removing the Curse

Approximate Nearest Neighbors:
Towards Removing the Curse of
Dimensionality
Piotr Indyk, Rajeev Motwani
The 30th annual ACM symposium on theory of computing 1998
Problems
• Nearest neighbor (NN) problem:
– Given a set of n points P={p1, …, pn} in some metric
space X, preprocess P so as to efficiently answer
queries which require finding the point in P closest to
a query point qX.
• Approximate nearest neighbor (ANN) problem:
– Find a point pP that is an –approximate nearest
neighbor of the query q in that for all p'P,
d(p,q)(1+)d(p',q).
Motivation
• The nearest neighbors problem is of major importance to
a variety of applications, usually involving similarity
searching.
–
–
–
–
–
–
–
Data compression
Databases and data mining
Information retrieval
Image and video databases
Machine learning
Pattern recognition
Statistics and data analysis
• Curse of dimensionality
– The curse of dimensionality is a term coined by Richard
Bellman to describe the problem caused by the exponential
increase in volume associated with adding extra dimensions to a
(mathematical) space.
Overview of results and techniques
• These results are obtained by reducing -NNS to a new
problem: point location in equal balls.
Content
nearest neighbor search (NNS)
-nearest neighbor search (NNS)
Ring-Cover Trees
Point location in equal balls (PLEB)
- Point location in equal balls (PLEB)
Locality-Sensitive
Hashing
Proposition 1
The Bucketing
method
Proposition 2
Random
projections
Proposition 3
Definitions
Theorems
Constructing Ring-cover trees
Analysis of Ring-cover trees
Definitions
Locality-Sensitive Hashing
The Bucketing method
• We decompose each ball into a bounded number of cells
and store them in a dictionary.
• The bucketing algorithm works for any lp norm.
J. L. Lemma