3D Hand Pose Estimation by Finding Appearance

Transcript 3D Hand Pose Estimation by Finding Appearance

Learning Embeddings for
Similarity-Based Retrieval
Vassilis Athitsos
Computer Science Department
Boston University
1
Overview
Background on similarity-based retrieval
and embeddings.
 BoostMap.



Embedding optimization using machine learning.
Query-sensitive embeddings.

Ability to preserve non-metric structure.
2
Problem Definition
database
(n objects)
x1
x2
x3
xn
3
Problem Definition
database
(n objects)

Goals:

find the k nearest neighbors of
query q.
x1
x2
q
x3
xn
4
Problem Definition
database
(n objects)

x1

x2
q
x3
Goals:

find the k nearest neighbors of
query q.
Brute force time is linear to:


n (size of database).
time it takes to measure a
single distance.
xn
5
Problem Definition
database
(n objects)

x1

x2
q
x3
Goals:

find the k nearest neighbors of
query q.
Brute force time is linear to:


n (size of database).
time it takes to measure a
single distance.
xn
6
Applications

Nearest neighbor
classification.

Similarity-based retrieval.





faces
Image/video databases.
Biological databases.
Time series.
Web pages.
Browsing music or movie
catalogs.
handshapes
letters/digits
7
Expensive Distance Measures

Comparing ddimensional vectors
is efficient:

O(d) time.
x1 x2 x3 x4 … xd
y1 y2 y3 y4 … yd
8
Expensive Distance Measures

Comparing ddimensional vectors
is efficient:


O(d) time.
Comparing strings of
length d with the edit
distance is more
expensive:


O(d2) time.
Reason: alignment.
x1 x2 x3 x4 … xd
i m m i g r a t i o n
y1 y2 y3 y4 … yd
i m i t a t i o n
9
Expensive Distance Measures

Comparing ddimensional vectors
is efficient:


O(d) time.
Comparing strings of
length d with the edit
distance is more
expensive:


O(d2) time.
Reason: alignment.
x1 x2 x3 x4 … xd
i m m i g r a t i o n
y1 y2 y3 y4 … yd
i m
i t
a t i o n
10
Matching Handwritten Digits
11
Matching Handwritten Digits
12
Matching Handwritten Digits
13
Shape Context Distance

Proposed by Belongie et al. (2001).




Error rate: 0.63%, with database of 20,000 images.
Uses bipartite matching (cubic complexity!).
22 minutes/object, heavily optimized.
Result preview: 5.2 seconds, 0.61% error rate.
14
More Examples

DNA and protein sequences:


Time series:


Dynamic Time Warping.
Probability distributions:


Smith-Waterman.
Kullback-Leibler Distance.
These measures are non-Euclidean,
sometimes non-metric.
15
Indexing Problem

Vector indexing methods NOT applicable.




PCA.
R-trees, X-trees, SS-trees.
VA-files.
Locality Sensitive Hashing.
16
Metric Methods

Pruning-based methods.



VP-trees, MVP-trees, M-trees, Slim-trees,…
Use triangle inequality for tree-based search.
Filtering methods.


AESA, LAESA…
Use the triangle inequality to compute upper/lower
bounds of distances.
Suffer from curse of dimensionality.
 Heuristic in non-metric spaces.
 In many datasets, bad empirical performance.

17
Embeddings
database
x1
Rd
x2
x3
embedding
F
x1
x2
xn
x4
x3
xn
18
Embeddings
database
x1
Rd
x2
x3
embedding
F
x1
x2
xn
x4
x3
xn
query
q
19
Embeddings
database
x1
Rd
x2
x3
embedding
F
x1
xn
x3
xn
x2
x4
q
query
q
20

Embeddings
Measure distances between vectors
(typically much faster).
database
x1
Rd
x2
x3
embedding
F
x1
xn
x3
xn
x2
x4
q
query
q
21

Embeddings
database
x1

Measure distances between vectors
(typically much faster).
Caveat: the embedding must
preserve similarity structure.
Rd
x2
x3
embedding
F
x1
xn
x3
xn
x2
x4
q
query
q
22
Reference Object Embeddings
database
23
Reference Object Embeddings
database
r1
r2
r3
24
Reference Object Embeddings
database
r1
r2
r3
x
F(x) = (D(x, r1), D(x, r2), D(x, r3))
25
F(x) = (D(x, LA), D(x, Lincoln), D(x, Orlando))
F(Sacramento)....=
F(Las Vegas).....=
F(Oklahoma City).=
F(Washington DC).=
F(Jacksonville)..=
( 386,
( 262,
(1345,
(2657,
(2422,
1543, 2920)
1232, 2405)
437, 1291)
1207, 853)
1344, 141)
26
Existing Embedding Methods

FastMap, MetricMap, SparseMap,
Lipschitz embeddings.


Use distances to reference objects (prototypes).
Question: how do we directly optimize an
embedding for nearest neighbor retrieval?


FastMap & MetricMap assume Euclidean
properties.
SparseMap optimizes stress.


Large stress may be inevitable when embedding
non-metric spaces into a metric space.
In practice often worse than random construction.
27
BoostMap


BoostMap: A Method for Efficient Approximate
Similarity Rankings.
Athitsos, Alon, Sclaroff, and Kollios,
CVPR 2004.
BoostMap: An Embedding Method for Efficient
Nearest Neighbor Retrieval.
Athitsos, Alon, Sclaroff, and Kollios,
PAMI 2007 (to appear).
28
Key Features of BoostMap
Maximizes amount of nearest neighbor
structure preserved by the embedding.
 Based on machine learning, not on
geometric assumptions.



Can capture non-metric structure.


Principled optimization, even in non-metric spaces.
Query-sensitive version of BoostMap.
Better results in practice, in all datasets we
have tried.
29
Ideal Embedding Behavior
F
original space X
Rd
a
q
For any query q: we want F(NN(q)) = NN(F(q)).
30
Ideal Embedding Behavior
F
original space X
Rd
a
q
For any query q: we want F(NN(q)) = NN(F(q)).
31
Ideal Embedding Behavior
F
original space X
Rd
a
q
For any query q: we want F(NN(q)) = NN(F(q)).
32
Ideal Embedding Behavior
F
original space X
Rd
b
a
q
For any query q: we want F(NN(q)) = NN(F(q)).
For any database object b besides NN(q), we
want F(q) closer to F(NN(q)) than to F(b).
33
Embeddings Seen As Classifiers
b
a
q
For triples (q, a, b) such that:
- q is a query object
- a = NN(q)
- b is a database object
Classification task: is q
closer to a or to b?
34
Embeddings Seen As Classifiers
b
a
q

For triples (q, a, b) such that:
- q is a query object
- a = NN(q)
- b is a database object
Classification task: is q
closer to a or to b?
Any embedding F defines a classifier F’(q, a, b).

F’ checks if F(q) is closer to F(a) or to F(b).
35
Classifier Definition
b
a
q


Classification task: is q
closer to a or to b?
Given embedding F: X  Rd:


For triples (q, a, b) such that:
- q is a query object
- a = NN(q)
- b is a database object
F’(q, a, b) = ||F(q) – F(b)|| - ||F(q) – F(a)||.
F’(q, a, b) > 0 means “q is closer to a.”
F’(q, a, b) < 0 means “q is closer to b.”
36
Key Observation
F
original space X
Rd
b
a
q

If classifier F’ is perfect, then for every q,
F(NN(q)) = NN(F(q)).

If F(q) is closer to F(b) than to F(NN(q)), then triple
(q, a, b) is misclassified.
37
Key Observation
F
original space X
Rd
b
a
q

Classification error on triples (q, NN(q), b)
measures how well F preserves nearest
neighbor structure.
38
Optimization Criterion
Goal: construct an embedding F optimized for
k-nearest neighbor retrieval.
 Method: maximize accuracy of F’ on triples
(q, a, b) of the following type:





q is any object.
a is a k-nearest neighbor of q in the database.
b is in database, but NOT a k-nearest neighbor of q.
If F’ is perfect on those triples, then F
perfectly preserves k-nearest neighbors.
39
1D Embeddings as Weak Classifiers

1D embeddings define weak classifiers.

Better than a random classifier (50% error rate).
40
Lincoln
Detroit
Chicago
Cleveland
New
York
Chicago
Detroit
LA
LA
New
York
41
1D Embeddings as Weak Classifiers

1D embeddings define weak classifiers.


Better than a random classifier (50% error rate).
We can define lots of different classifiers.

Every object in the database can be a reference object.
42
1D Embeddings as Weak Classifiers

1D embeddings define weak classifiers.


Better than a random classifier (50% error rate).
We can define lots of different classifiers.

Every object in the database can be a reference object.
Question: how do we combine many such
classifiers into a single strong classifier?
43
1D Embeddings as Weak Classifiers

1D embeddings define weak classifiers.


Better than a random classifier (50% error rate).
We can define lots of different classifiers.

Every object in the database can be a reference object.
Question: how do we combine many such
classifiers into a single strong classifier?
Answer: use AdaBoost.

AdaBoost is a machine learning method designed for
exactly this problem.
44
Using AdaBoost
original space X
Real line
F1
F2
Fn

Output: H = w1F’1 + w2F’2 + … + wdF’d .



AdaBoost chooses 1D embeddings and weighs them.
Goal: achieve low classification error.
AdaBoost trains on triples chosen from the database.
45
From Classifier to Embedding
AdaBoost output
H = w1F’1 + w2F’2 + … + wdF’d
What embedding should we use?
What distance measure should we use?
46
From Classifier to Embedding
AdaBoost output
BoostMap
embedding
H = w1F’1 + w2F’2 + … + wdF’d
F(x) = (F1(x), …, Fd(x)).
47
From Classifier to Embedding
AdaBoost output
BoostMap
embedding
Distance
measure
H = w1F’1 + w2F’2 + … + wdF’d
F(x) = (F1(x), …, Fd(x)).
D((u1, …, ud), (v1, …, vd)) =

d
w |u – vi|
i=1 i i
48
From Classifier to Embedding
AdaBoost output
BoostMap
embedding
Distance
measure
H = w1F’1 + w2F’2 + … + wdF’d
F(x) = (F1(x), …, Fd(x)).
D((u1, …, ud), (v1, …, vd)) =

d
w |u – vi|
i=1 i i
Claim:
Let q be closer to a than to b. H misclassifies
triple (q, a, b) if and only if, under distance
measure D, F maps q closer to b than to a.
49
Proof
H(q, a, b) =

=

=

=
d
i=1
wiF’i(q, a, b)
d
i=1
wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)
d
i=1
(wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)
= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)
50
Proof
H(q, a, b) =

=

=

=
d
i=1
wiF’i(q, a, b)
d
i=1
wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)
d
i=1
(wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)
= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)
51
Proof
H(q, a, b) =

=

=

=
d
i=1
wiF’i(q, a, b)
d
i=1
wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)
d
i=1
(wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)
= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)
52
Proof
H(q, a, b) =

=

=

=
d
i=1
wiF’i(q, a, b)
d
i=1
wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)
d
i=1
(wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)
= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)
53
Proof
H(q, a, b) =

=

=

=
d
i=1
wiF’i(q, a, b)
d
i=1
wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)
d
i=1
(wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)
= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)
54
Proof
H(q, a, b) =

=

=

=
d
i=1
wiF’i(q, a, b)
d
i=1
wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)
d
i=1
(wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)
= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)
55
Significance of Proof
AdaBoost optimizes a direct measure of
embedding quality.
 We optimize an indexing structure for
similarity-based retrieval using machine
learning.


Take advantage of training data.
56
How Do We Use It?
Filter-and-refine retrieval:
 Offline step: compute embedding F of
entire database.
57
How Do We Use It?
Filter-and-refine retrieval:
 Offline step: compute embedding F of
entire database.
 Given a query object q:

Embedding step:

Compute distances from query to reference
objects  F(q).
58
How Do We Use It?
Filter-and-refine retrieval:
 Offline step: compute embedding F of
entire database.
 Given a query object q:

Embedding step:


Compute distances from query to reference
objects  F(q).
Filter step:

Find top p matches of F(q) in vector space.
59
How Do We Use It?
Filter-and-refine retrieval:
 Offline step: compute embedding F of
entire database.
 Given a query object q:

Embedding step:


Filter step:


Compute distances from query to reference
objects  F(q).
Find top p matches of F(q) in vector space.
Refine step:

Measure exact distance from q to top p matches.
60
Evaluating Embedding Quality
How often do we find the true nearest neighbor?

Embedding step:


Filter step:


Compute distances from query to reference
objects  F(q).
Find top p matches of F(q) in vector space.
Refine step:

Measure exact distance from q to top p matches.
61
Evaluating Embedding Quality
How often do we find the true nearest neighbor?

Embedding step:


Filter step:


Compute distances from query to reference
objects  F(q).
Find top p matches of F(q) in vector space.
Refine step:

Measure exact distance from q to top p matches.
62
Evaluating Embedding Quality
How often do we find the true nearest neighbor?
How many exact distance computations do we need?

Embedding step:


Filter step:


Compute distances from query to reference
objects  F(q).
Find top p matches of F(q) in vector space.
Refine step:

Measure exact distance from q to top p matches.
63
Evaluating Embedding Quality
How often do we find the true nearest neighbor?
How many exact distance computations do we need?

Embedding step:


Filter step:


Compute distances from query to reference
objects  F(q).
Find top p matches of F(q) in vector space.
Refine step:

Measure exact distance from q to top p matches.
64
Evaluating Embedding Quality
How often do we find the true nearest neighbor?
How many exact distance computations do we need?

Embedding step:


Filter step:


Compute distances from query to reference
objects  F(q).
Find top p matches of F(q) in vector space.
Refine step:

Measure exact distance from q to top p matches.
65
Evaluating Embedding Quality
What is the nearest neighbor classification error?
How many exact distance computations do we need?

Embedding step:


Filter step:


Compute distances from query to reference
objects  F(q).
Find top p matches of F(q) in vector space.
Refine step:

Measure exact distance from q to top p matches.
66
Results on Hand Dataset
nearest
neighbor
Database (80,640 images)
query
Chamfer distance: 112 seconds per query
67
Results on Hand Dataset
Database: 80,640 synthetic images of hands.
Query set: 710 real images of hands.
Brute
Force
Accuracy
100%
Distances
80640
Seconds
112
Speed-up
1
68
Results on Hand Dataset
Database: 80,640 synthetic images of hands.
Query set: 710 real images of hands.
Brute
Force
BM
RLP
FM
VP
Accuracy
100%
95%
95%
95%
95%
Distances
80640
450
1444
2647
5471
Seconds
112
0.6
2.0
3.7
7.6
Speed-up
1
179
56
30
15
69
Results on MNIST Dataset


MNIST: 60,000 database objects, 10,000 queries.
Shape context (Belongie 2001):
 0.63% error, 20,000 distances, 22 minutes.
 0.54% error, 60,000 distances, 66 minutes.
70
Results on MNIST Dataset
Distances
per query
Seconds
per query
Error
rate
Brute force
60,000
3,696
0.54%
VP-trees
21,152
1306
0.63%
Condensing
1,060
71
2.40%
BoostMap
800
53
0.58%
Zhang 2003
50
3.3
2.55%
BoostMap
50
3.3
1.50%
BoostMap*
50
3.3
0.83%
Method
71
Query-Sensitive Embeddings
 Richer
models.
Capture non-metric structure.
 Better embedding quality.


References:


Athitsos, Hadjieleftheriou, Kollios, and Sclaroff,
SIGMOD 2005.
Athitsos, Hadjieleftheriou, Kollios, and Sclaroff,
TODS, June 2007.
72
Capturing Non-Metric Structure



A human is not similar to a horse.
A centaur is similar both to a human and a horse.
Triangle inequality is violated:


Using human ratings of similarity (Tversky, 1982).
Using k-median Hausdorff distance.
73
Capturing Non-Metric Structure

Mapping to a metric space presents dilemma:


If D(F(centaur), F(human)) = D(F(centaur), F(horse)) = C,
then D(F(human), F(horse)) <= 2C.
Query-sensitive embeddings:

Have the modeling power to preserve non-metric structure.
74
Local Importance of Coordinates

How important is each coordinate in
comparing embeddings?
database
x1
x2
xn
Rd
x11 x12 x13 x14 … x1d
embedding
F
x21 x22 x23 x24 … x2d
xn1 xn2 xn3 xn4 … xnd
query
q
q1 q2 q3 q4 … qd
75
F(x) = (D(x, LA), D(x, Lincoln), D(x, Orlando))
F(Sacramento)....=
F(Las Vegas).....=
F(Oklahoma City).=
F(Washington DC).=
F(Jacksonville)..=
( 386,
( 262,
(1345,
(2657,
(2422,
1543, 2920)
1232, 2405)
437, 1291)
1207, 853)
1344, 141)
76
General Intuition
original space X
1
2
3


Classifier: H = w1F’1 + w2F’2 + … + wjF’j.
Observation: accuracy of weak classifiers depends
on query.



F’1 is perfect for (q, a, b) where q = reference object 1.
F’1 is good for queries close to reference object 1.
Question: how can we capture that?
77
Query-Sensitive Weak Classifiers
original space X
1
2
3

V: area of influence (interval of real numbers).
F’(q, a, b) if F(q) is in V

QF,V(q, a, b) =
“I don’t know” if F(q) not in V
78
Query-Sensitive Weak Classifiers
original space X
1
2
j

V: area of influence (interval of real numbers).
F’(q, a, b) if F(q) is in V

QF,V(q, a, b) =
“I don’t know” if F(q) not in V
 If V includes all real numbers, QF,V = F’.
79
Applying AdaBoost
original space X
Real line
F1
F2
Fd

AdaBoost forms classifiers QFi,Vi.



Fi: 1D embedding.
Vi: area of influence for Fi.
Output: H = w1 QF1,V1 + w2 QF2,V2 + … + wd QFd,Vd .
80
Applying AdaBoost
original space X
Real line
F1
F2
Fd

Empirical observation:

At late stages of the training, query-sensitive weak
classifiers are still useful, whereas query-insensitive
classifiers are not.
81
From Classifier to Embedding
AdaBoost
output
H(q, a, b) =

d
i=1
wi QFi,Vi(q, a, b)
What embedding should we use?
What distance measure should we use?
82
From Classifier to Embedding

AdaBoost
output
H(q, a, b) =
BoostMap
embedding
F(x) = (F1(x), …, Fd(x))
Distance
measure
D(F(q), F(x)) =
d
i=1

wi QFi,Vi(q, a, b)
d
w
i=1 i
SFi,Vi (q) |Fi(q) – Fi(x)|
83
From Classifier to Embedding

AdaBoost
output
H(q, a, b) =
BoostMap
embedding
F(x) = (F1(x), …, Fd(x))
Distance
measure

D(F(q), F(x)) =
d
i=1

wi QFi,Vi(q, a, b)
d
w
i=1 i
SFi,Vi (q) |Fi(q) – Fi(x)|
Distance measure is query-sensitive.


Weighted L1 distance, weights depend on q.
SF,V(q) = 1 if F(q) is in V, 0 otherwise.
84
Centaurs Revisited

Reference objects: human, horse, centaur.



For centaur queries, use weights (0,0,1).
For human queries, use weights (1,0,0).
Query-sensitive distances are non-metric.

Combine efficiency of L1 distance and ability to capture nonmetric structure.
85
F(x) = (D(x, LA), D(x, Lincoln), D(x, Orlando))
F(Sacramento)....=
F(Las Vegas).....=
F(Oklahoma City).=
F(Washington DC).=
F(Jacksonville)..=
( 386,
( 262,
(1345,
(2657,
(2422,
1543, 2920)
1232, 2405)
437, 1291)
1207, 853)
1344, 141)
86
Recap of Advantages
Capturing non-metric structure.
 Finding most informative reference
objects for each query.
 Richer model overall.


Choosing a weak classifier now also involves
choosing an area of influence.
87
Dynamic Time Warping on
Time Series
Database: 31818 time series.
Query set: 1000 time series.
QuerySensitive
QueryInsensitive
Accuracy
95%
95%
# of distances
1995
5691
Sec. per query
33
95
Speed-up factor
16
5.6
88
Dynamic Time Warping on
Time Series
Database: 32768 time series.
Query set: 50 time series.
QuerySensitive
Vlachos
KDD 2003
100%
100%
# of distances
640
over 6500
Sec. per query
10.7
over 110
Speed-up factor
51.2
under 5
Accuracy
89
BoostMap Recap - Theory

Machine-learning method for optimizing
embeddings.



Explicitly maximizes amount of nearest neighbor
structure preserved by embedding.
Optimization method is independent of underlying
geometry.
Query-sensitive version can capture non-metric
structure.
90
BoostMap Recap - Practice

BoostMap can significantly speed up nearest
neighbor retrieval and classification.

Useful in real-world datasets:
Hand shape classification.
 Optical character recognition (MNIST, UNIPEN).
In all four datasets, better results than other methods.
 In three benchmark datasets, better than methods
custom-made for those distance measures.
Domain-independent formulation.
 Distance measures are used as a black box.
 Application to proteins/DNA matching…



91
END
92

3D Hand Pose Estimation by Finding Appearance

Transcript 3D Hand Pose Estimation by Finding Appearance

Directory