Cached sufficient statistics - Society for Industrial and

Transcript Cached sufficient statistics - Society for Industrial and

Fast N-Body Algorithms
for Massive Datasets
Alexander Gray
Georgia Institute of Technology
Is science in 2007
different from science in 1907?
Instruments
[Science, Szalay & J. Gray, 2001]
Is science in 2007
different from science in 1907?
Instruments
Data: CMB Maps
1.0E+07
1.0E+07
1.0E+06
1.0E+06
1.0E+05
1.0E+05
1.0E+04
1.0E+04
1.0E+03
1.0E+03
1.0E+02
1.0E+02
1985
1985
1990
1990
[Science, Szalay & J. Gray, 2001]
Data: Local Redshift Surveys
1986 CfA
3,500
1996 LCRS 23,000
2003 2dF 250,000
2005 SDSS 800,000
Data: Angular Surveys
1970 Lick
1M
1990 APM
2M
2005 SDSS 200M
2008 LSST
2B
1995
1995
2000
2000
2005
2005
2010
2010
1990 COBE
1,000
2000 Boomerang 10,000
2002 CBI
50,000
2003 WMAP
1 Million
2008 Planck 10 Million
Sloan Digital Sky Survey
(SDSS)
1 billion objects
144 dimensions
(~250M galaxies in 5 colors,
~1M 2000-D spectra)
Size matters! Now possible:
• low noise: subtle patterns
• global properties and patterns
• rare objects and patterns
• more info: 3d, deeper/earlier, bands
• in parallel: more accurate simulations
• 2008: LSST – time-varying phenomena
Happening everywhere!
microarray chips
Molecular biology
satellite topography Earth sciences
functional MRI
Neuroscience
nuclear mag. resonance Drug discovery
microprocessor
Physical simulation
fiber optics
Internet
Astrophysicist
1. How did galaxies evolve?
2. What was the early universe like?
3. Does dark energy exist?
4. Is our model (GR+inflation) right?
R. Nichol, Inst. Cosmol. Gravitation
A. Connolly, U. Pitt Physics
C. Miller, NOAO
R. Brunner, NCSA
G. Kulkarni, Inst. Cosmol. Gravitation
D. Wake, Inst. Cosmol. Gravitation
R. Scranton, U. Pitt Physics
M. Balogh, U. Waterloo Physics
I. Szapudi, U. Hawaii Inst. Astronomy
G. Richards, Princeton Physics
A. Szalay, Johns Hopkins Physics
Machine learning/
statistics guy
Astrophysicist
1. How did galaxies evolve?
2. What was the early universe like?
3. Does dark energy exist?
4. Is our model (GR+inflation) right?
• Kernel density estimator O(N2)
n
• n-point
R. Nichol, Inst. Cosmol.
Grav. spatial statistics O(N )
A. Connolly, U. Pitt
Physics
• Nonparametric
Bayes classifier O(N2)
C. Miller, NOAO • Support vector machine O(N2)
R. Brunner, NCSA
• Nearest-neighbor statistics O(N2)
G. Kulkarni, Inst. Cosmol. Grav.
• Gaussian
process regression O(N3)
D. Wake, Inst. Cosmol.
Grav.
DT(N))
O(c
•
Bayesian
inference
R. Scranton, U. Pitt Physics
M. Balogh, U. Waterloo Physics
I. Szapudi, U. Hawaii Inst. Astro.
G. Richards, Princeton Physics
A. Szalay, Johns Hopkins Physics
Machine learning/
statistics guy
Astrophysicist
1. How did galaxies evolve?
2. What was the early universe like?
3. Does dark energy exist?
4. Is our model (GR+inflation) right?
• Kernel density estimator O(N2)
n
• n-point
R. Nichol, Inst. Cosmol.
Grav. spatial statistics O(N )
A. Connolly, U. Pitt
Physics
• Nonparametric
Bayes classifier O(N2)
C. Miller, NOAO • Support vector machine O(N2)
R. Brunner, NCSA
• Nearest-neighbor statistics O(N2)
G. Kulkarni, Inst. Cosmol. Grav.
• Gaussian
process regression O(N3)
D. Wake, Inst. Cosmol.
Grav.
DT(N))
O(c
•
Bayesian
inference
R. Scranton, U. Pitt Physics
M. Balogh, U. Waterloo Physics
I. Szapudi, U. Hawaii Inst. Astro.
G. Richards, Princeton Physics
A. Szalay, Johns Hopkins Physics
Machine learning/
statistics guy
Astrophysicist
1. How did galaxies evolve?
2. What was the early universe like?
3. Does dark energy exist?
4. Is our model (GR+inflation) right?
• Kernel density estimator O(N2)
n
• n-point
R. Nichol, Inst. Cosmol.
Grav. spatial statistics O(N )
A. Connolly, U. Pitt
Physics
• Nonparametric
Bayes classifier O(N2)
C. Miller, NOAO • Support vector machine O(N2)
R. Brunner, NCSA
• Nearest-neighbor statistics O(N2)
G. Kulkarni, Inst. Cosmol. Grav.
• Gaussian
process regression O(N3)
D. Wake, Inst. Cosmol.
Grav.
DT(N))
O(c
•
Bayesian
inference
R. Scranton, U. Pitt Physics
M. Balogh, U. Waterloo Physics
I. Szapudi, U. Hawaii Inst. Astro.
G. Richards,
Princeton
But I have
1Physics
million
A. Szalay, Johns Hopkins Physics
points
Machine learning/
statistics guy
Data: The Stack
Apps
Perception
(User, Science)
ML / Opt
Machine Learning / Optimization / Linear
Algebra / Privacy
Data Abstractions
Clustering / Threading
DBMS , MapReduce , VOTables ,
O/S
Network
Motherboards /
Datacenter
ICs
Computer Vision, NLP, Machine Translation,
Bibleome , Autonomous vehicles
Programming with 1000s of powerful compute
nodes
Data: The Stack
Apps
Perception
(User, Science)
ML / Opt
Machine Learning / Optimization / Linear
Algebra / Privacy
Data Abstractions
DBMS , MapReduce , VOTables , Data
structures
Clustering / Threading
Programming with 1000s of powerful compute
nodes
O/S
Network
Motherboards /
Datacenter
ICs
Computer Vision, NLP, Machine Translation,
Bibleome , Autonomous vehicles
Making fast algorithms
• There are many large datasets. There
are many questions we want to ask
them.
– Why we must not get obsessed with one
specific dataset.
– Why we must not get obsessed with one
specific question.
• The activity I’ll describe is about
accerating computations which occur
commonly across many ML methods.
Scope
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Nearest neighbor
K-means
Hierarchical clustering
N-point correlation functions
Kernel density estimation
Locally-weighted regression
Mean shift tracking
Mixtures of Gaussians
Gaussian process regression
Manifold learning
Support vector machines
Affinity propagation
PCA
….
Scope
• ML methods with distances underneath
– Distances only
– Continuous kernel functions
• ML methods with counting underneath
Scope
• Computational ideas in this tutorial:
– Data structures
– Monte Carlo
– Series expansions
– Problem/solution abstractions
• Challenges
– Don’t introduce error, if possible
– Don’t introduce tweak parameters, if
possible
Two canonical problems
• Nearest-neighbor search
NN ( xq )  arg min xq  xr
r
• Kernel density estimation
N
1
fˆ ( xq )   K h ( xq  xr )
N r q
Ideas
1.
2.
3.
4.
Data structures and how to use them
Monte Carlo
Series expansions
Problem/solution abstractions
Nearest Neighbor - Naïve Approach
• Given a query point X.
• Scan through each point Y:
– Calculate the distance
d(X,Y)
– If d(X,Y) < best_seen then
Y is the new nearest
neighbor.
• Takes O(N) time for each
query!
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
33 Distance Computations
Speeding Up Nearest Neighbor
• We can speed up the search for the nearest
neighbor:
– Examine nearby points first.
– Ignore any points that are further then the nearest
point found so far.
• Do this using a KD-tree:
– Tree based data structure
– Recursively partitions points into axis aligned boxes.
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
KD-Tree Construction
Pt
X
Y
1
0.00 0.00
2
1.00 4.31
3
0.13 2.85
…
…
…
We start with a list of n-dimensional points.
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
KD-Tree Construction
X>.5
NO
Pt
X
Y
YES
Pt
X
Y
1
0.00 0.00
2
1.00
4.31
3
0.13 2.85
…
…
…
…
…
…
We can split the points into 2 groups by choosing a
dimension X and value V and separating the points into
X > V and X <= V.
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
KD-Tree Construction
X>.5
NO
Pt
X
Y
YES
Pt
X
Y
1
0.00 0.00
2
1.00
4.31
3
0.13 2.85
…
…
…
…
…
…
We can then consider each group separately and
possibly split again (along same/different dimension).
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
KD-Tree Construction
NO
X>.5
Pt
X
Y
2
1.00
4.31
YES
…
…
…
Pt
X
Y
Y>.1
NO
Pt
3
…
X
Y
0.13 2.85
…
…
YES
1
…
0.00 0.00
…
…
We can then consider each group separately and
possibly split again (along same/different dimension).
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
KD-Tree Construction
We can keep splitting the points in each set to create a
tree structure. Each node with no children (leaf node)
contains a list of points.
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
KD-Tree Construction
We will keep around one additional piece of
information at each node. The (tight) bounds of the
points at or below this node.
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
KD-Tree Construction
Use heuristics to make splitting decisions:
• Which dimension do we split along? Widest
• Which value do we split at? Median of value of
that split dimension for the points.
• When do we stop? When there are fewer then m
points left OR the box has hit some minimum
width.
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
Exclusion and inclusion,
using point-node kd-tree bounds.
O(D) bounds on distance minima/maxima:
 
  max u
D


l ) 

min i x  xi   max ld  xd  ,0  max  xd  ud  ,0
max i x  xi
2
d
D
 xd  , ( xd
2
d
d
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
2
d
2
Exclusion and inclusion,
using point-node kd-tree bounds.
O(D) bounds on distance minima/maxima:
 
  max u
D


l ) 

min i x  xi   max ld  xd  ,0  max  xd  ud  ,0
max i x  xi
2
d
D
 xd  , ( xd
2
d
d
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
2
d
2
Nearest Neighbor with KD Trees
We traverse the tree looking for the nearest
neighbor of the query point.
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
Nearest Neighbor with KD Trees
Examine nearby points first: Explore the branch of
the tree that is closest to the query point first.
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
Nearest Neighbor with KD Trees
Examine nearby points first: Explore the branch of
the tree that is closest to the query point first.
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
Nearest Neighbor with KD Trees
When we reach a leaf node: compute the distance
to each point in the node.
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
Nearest Neighbor with KD Trees
When we reach a leaf node: compute the distance
to each point in the node.
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
Nearest Neighbor with KD Trees
Then we can backtrack and try the other branch at
each node visited.
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
Nearest Neighbor with KD Trees
Each time a new closest node is found, we can
update the distance bounds.
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
Nearest Neighbor with KD Trees
Using the distance bounds and the bounds of the
data below each node, we can prune parts of the
tree that could NOT include the nearest neighbor.
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
Nearest Neighbor with KD Trees
Using the distance bounds and the bounds of the
data below each node, we can prune parts of the
tree that could NOT include the nearest neighbor.
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
Nearest Neighbor with KD Trees
Using the distance bounds and the bounds of the
data below each node, we can prune parts of the
tree that could NOT include the nearest neighbor.
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
Simple recursive algorithm
(k=1 case)
NN(xq,R,dlo,xsofar,dsofar)
{
if dlo > dsofar, return.
if leaf(R), [xsofar,dsofar]=NNBase(xq,R,dsofar).
else,
[R1,d1,R2,d2]=orderByDist(xq,R.l,R.r).
NN(xq,R1,d1,xsofar,dsofar).
NN(xq,R2,d2,xsofar,dsofar).
}
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
Nearest Neighbor with KD Trees
Instead, some animations showing real data…
1. kd-tree with cached sufficient statistics
2. nearest-neighbor with kd-trees
3. range-count with kd-trees
For animations, see:
http://www.cs.cmu.edu/~awm/animations/kdtree
Qu i c k T i m e™ an d a
TIF F (L ZW) dec o m pr es s o r
are nee ded to s ee th i s p i c tur e.
Slides by Jeremy Kubica
Range-count example
Range-count example
Range-count example
Range-count example
Range-count example
Pruned!
(inclusion)
Range-count example
Range-count example
Range-count example
Range-count example
Range-count example
Range-count example
Range-count example
Range-count example
Pruned!
(exclusion)
Range-count example
Range-count example
Range-count example
Some questions
• Asymptotic runtime analysis?
– In a rough sense, O(logN)
– But only under some regularity conditions
• How high in dimension can we go?
– Roughly exponential in intrinsic dimension
– In practice, in less than 100 dimensions,
still big speedups
Another kind of tree
• Ball-trees, metric trees
– Use balls instead of hyperrectangles
– Can often be more efficient in high
dimension (though not always)
– Can work with non-Euclidean metric (you
only need to respect the triangle inequality)
– Many non-metric similarity measures can
be bounded by metric quantities.
A Set of Points
in a metric
space
Ball Tree root
node
A Ball Tree
A Ball Tree
A Ball Tree
A Ball Tree
A Ball Tree
•J. Uhlmann, 1991
•S. Omohundro, NIPS 1991
Ball-trees: properties
Let Q be any query point and let x be a
point inside ball B
|x-Q|  |Q - B.center| - B.radius
|x-Q|  |Q - B.center| + B.radius
B.center
x
Q
How to build a metric tree,
exactly?
• Must balance quality vs. build-time
• ‘Anchors hierarchy’ (farthest-points
heuristic, 2-approx used in OR)
• Omohundro: ‘Five ways to build a balltree’
• Which is the best? A research topic…
Some other trees
• Cover-tree
– Provable worst-case O(logN) under an
assumption (bounded expansion constant)
– Like a non-binary ball-tree
• Learning trees
– In this conference
‘All’-type problems
• Nearest-neighbor search
All-nearest neighbor
(bichromatic):
r
xq : NN ( xq )  arg min xq  xr
r
• Kernel density estimation
‘All’ version
(bichromatic):
NN ( xq )  arg min xq  xr
N
1
fˆ ( xq )   K h ( xq  xr )
N r q
N
1
xq : fˆ ( xq )   K h ( xq  xr )
N r q
Almost always ‘all’-type problems
•
•
•
•
Kernel density estimation
Nadaraya-Watson & locally-wgtd regression
Gaussian process prediction
Radial basis function networks
Always ‘all’-type problems
• Monochromatic all-nearest neighbor (e.g.
LLE)
• n-point correlation (n-tuples)
Dual-tree idea
If all the queries are available
simultaneously, then it is faster to:
1. Build a tree on the queries as well
2. Effectively process the queries in
chunks rather than individually  work
is shared between similar query points
Single-tree:
Single-tree:
Dual-tree (symmetric):
Exclusion and inclusion,
using point-node kd-tree bounds.
O(D) bounds on distance minima/maxima:
 
  max u
D


l ) 

min i x  xi   max ld  xd  ,0  max  xd  ud  ,0
max i x  xi
2
d
D
d
 xd  , ( xd
2
d
2
d
2
Exclusion and inclusion,
using point-node kd-tree bounds.
O(D) bounds on distance minima/maxima:
 
  max u
D


l ) 

min i x  xi   max ld  xd  ,0  max  xd  ud  ,0
max i x  xi
2
d
D
d
 xd  , ( xd
2
d
2
d
2
Exclusion and inclusion,
using kd-tree node-node bounds.
O(D) bounds on distance minima/maxima:
(Analogous to point-node bounds.)
Also needed:
Nodewise bounds.
Exclusion and inclusion,
using kd-tree node-node bounds.
O(D) bounds on distance minima/maxima:
(Analogous to point-node bounds.)
Also needed:
Nodewise bounds.
Single-tree: simple recursive algorithm
(k=1 case)
NN(xq,R,dlo,xsofar,dsofar)
{
if dlo > dsofar, return.
if leaf(R), [xsofar,dsofar]=NNBase(xq,R,dsofar).
else,
[R1,d1,R2,d2]=orderByDist(xq,R.l,R.r).
NN(xq,R1,d1,xsofar,dsofar).
NN(xq,R2,d2,xsofar,dsofar).
}
Single-tree  Dual-tree
•
•
•
•
•
xq  Q
dlo(xq,R)  dlo(Q,R)
xsofar  xsofar, dsofar  dsofar
store Q.dsofar=maxQdsofar
2-way recursion  4-way recursion
• N x O(logN)  O(N)
Dual-tree: simple recursive algorithm (k=1)
AllNN(Q,R,dlo,xsofar,dsofar)
{
if dlo > Q.dsofar, return.
if leaf(Q) & leaf(R),
[xsofar,dsofar]=AllNNBase(Q,R,dsofar). Q.dsofar=maxQdsofar.
else if !leaf(Q) & leaf(R), …
else if leaf(Q) & !leaf(R), …
else if !leaf(Q) & !leaf(R),
[R1,d1,R2,d2]=orderByDist(Q.l,R.l,R.r).
AllNN(Q.l,R1,d1,xsofar,dsofar).
AllNN(Q.l,R2,d2,xsofar,dsofar).
[R1,d1,R2,d2]=orderByDist(Q.r,R.l,R.r).
AllNN(Q.r,R1,d1,xsofar,dsofar).
AllNN(Q.r,R2,d2,xsofar,dsofar).
Q.dsofar = max(Q.l.dsofar,Q.r.dsofar).
}
Dual-tree traversal
(depth-first)
Query points
Reference points
Dual-tree traversal
Query points
Reference points
Dual-tree traversal
Query points
Reference points
Dual-tree traversal
Query points
Reference points
Dual-tree traversal
Query points
Reference points
Dual-tree traversal
Query points
Reference points
Dual-tree traversal
Query points
Reference points
Dual-tree traversal
Query points
Reference points
Dual-tree traversal
Query points
Reference points
Dual-tree traversal
Query points
Reference points
Dual-tree traversal
Query points
Reference points
Dual-tree traversal
Query points
Reference points
Dual-tree traversal
Query points
Reference points
Dual-tree traversal
Query points
Reference points
Dual-tree traversal
Query points
Reference points
Dual-tree traversal
Query points
Reference points
Dual-tree traversal
Query points
Reference points
Dual-tree traversal
Query points
Reference points
Dual-tree traversal
Query points
Reference points
Dual-tree traversal
Query points
Reference points
Dual-tree traversal
Query points
Reference points
Meta-idea:
Higher-order
Divide-and-conquer
Generalizes divide-and-conquer of a single set
to divide-and-conquer of multiple sets.
Break each set into pieces.
Solving the sub-parts of the problem and
combining these sub-solutions appropriately
might be easier than doing this over only one set.
Ideas
1.
2.
3.
4.
Data structures and how to use them
Monte Carlo
Series expansions
Problem/solution abstractions
Characterization of an entire distribution?
2-point correlation
“How many pairs have distance < r ?”
N
N
 I ( x  x
i
j i
2-point correlation
function
i
j
 r)
r
The n-point correlation functions
• Spatial inferences: filaments, clusters, voids,
homogeneity, isotropy, 2-sample testing, …
• Foundation for theory of point processes
[Daley,Vere-Jones 1972], unifies spatial statistics [Ripley
1976]
• Used heavily in biostatistics, cosmology, particle
physics, statistical physics
2pcf definition:
dP   dV1dV2[1   (r )]
2
3pcf definition:
dP  3dV1dV2 dV3  [1   (r12 )   (r23 )   (r13 )   (r12 , r23 , r13 )]
Standard model: n>0 terms
should be zero!
r2
r1
r3
3-point correlation
“How many triples have
pairwise distances < r ?”
N
N
N
  I (
i
j i k  j i
ij
 r1 ) I ( jk  r2 ) I ( ki  r3 )
How can we count n-tuples efficiently?
“How many triples have
pairwise distances < r ?”
Use n trees!
[Gray & Moore, NIPS 2000]
“How many valid triangles a-b-c
(where a  A, b  B, c  C)
could there be?
r
count{A,B,C} =
?
A
B
C
“How many valid triangles a-b-c
(where a  A, b  B, c  C)
could there be?
r
count{A,B,C} =
count{A,B,C.left}
+
count{A,B,C.right}
A
B
C
“How many valid triangles a-b-c
(where a  A, b  B, c  C)
could there be?
r
count{A,B,C} =
count{A,B,C.left}
+
count{A,B,C.right}
A
B
C
“How many valid triangles a-b-c
(where a  A, b  B, c  C)
could there be?
r
A
B
count{A,B,C} =
C
?
“How many valid triangles a-b-c
(where a  A, b  B, c  C)
could there be?
r
A
B
count{A,B,C} =
C
Exclusion
0!
“How many valid triangles a-b-c
(where a  A, b  B, c  C)
could there be?
r
A
B
count{A,B,C} =
C
?
“How many valid triangles a-b-c
(where a  A, b  B, c  C)
could there be?
Inclusion
A
r
B
count{A,B,C} =
Inclusion
C
|A| x |B| x |C|
Inclusion
Key idea
(combinatorial proximity
problems):
for n-tuples:
n-tree recursion
3-point runtime
(biggest previous:
20K)
n=2: O(N)
n=3: O(Nlog3)
VIRGO
simulation data,
N = 75,000,000
naïve: 5x109 sec.
(~150 years)
multi-tree: 55 sec.
(exact)
n=4: O(N2)
But…
Depends on rD-1.
Slow for large radii.
VIRGO simulation data,
N = 75,000,000
naïve: ~150 years
multi-tree:
large h: 24 hrs
Let’s develop a method for large radii.
c=pT
hard.
known.
EASIER?
pˆ  z / 2
pˆ (1  pˆ )
S
no dependence on N! but it does depend on p
c=pT
pˆ  z / 2
pˆ (1  pˆ )
S
no dependence on N! but it does depend on p
c=pT
pˆ  z / 2
pˆ (1  pˆ )
S
no dependence on N! but it does depend on p
c=pT
pˆ  z / 2
pˆ (1  pˆ )
S
no dependence on N! but it does depend on p
c=pT
pˆ  z / 2
pˆ (1  pˆ )
S
no dependence on N! but it does depend on p
c=pT
This is junk:
don’t bother
c=pT
This is
promising
Basic idea:
1. Remove some junk
(Run exact algorithm for a while)
 make p larger
2. Sample from the rest
Get disjoint sets from the recursion tree
all possible
N n-tuples
 
 
n
…
…
…
[prune]
Now do stratified sampling
T1
T1
pˆ1
T
2
+
T2
+
T2
pˆ 2
T
2
+
T3
=
T
+
T3
pˆ 3
T
=
pˆ
2
 T1   2 +  T2   +2  T3   2 =
2
ˆ
ˆ
  1   ˆ 2   ˆ3
T 
T 
T 
Speedup Results
VIRGO simulation data
N = 75,000,000
naïve: ~150 years
multi-tree:
large h: 24 hrs
multi-tree monte carlo:
99% confidence:
96 sec
Ideas
1.
2.
3.
4.
Data structures and how to use them
Monte Carlo
Multipole methods
Problem/solution abstractions
Kernel density estimation
xq ,
N
1
fˆ ( xq )   K h ( xq  xr )
N r q
How to use a tree…
1. How to approximate?
2. When to approximate?
[Barnes and Hut, Science, 1987]
q
R
s
r
 K ( q, x )  N
i
R
K ( q,  R )
i
if
s
r

R
How to use a tree…
3. How to know potential error?
Let’s maintain bounds on the true kernel sum
(q)   K (q, xi )
i
R
At the beginning:
 lo (q)  NK lo
lo
lo (q)  lo (q)  N R K (q,  qR
)  N R K lo
 hi (q)  NK hi
hi
 hi (q)   hi (q)  N R K (q,  qR
)  N R K hi
How to use a tree…
4. How to do ‘all’ problem?
xq ,
N
1
fˆ ( xq )   K h ( xq  xr )
N r q
Single-tree:
Dual-tree (symmetric):
[Gray & Moore 2000]
How to use a tree…
4. How to do ‘all’ problem?
Q
R
s
rQ
rR
q  Q,  K (q, xi )  N R K (q,  R )
i
if
s
max( rQ , rR )

Generalizes Barnes-Hut to dual-tree
BUT:
We have a tweak parameter: 
Case 1 – alg. gives no error bounds
Case 2 – alg. gives error bounds, but must be rerun
Case 3 – alg. automatically achieves error tolerance
So far we have case 2;
let’s try for case 3
Let’s try to make an automatic stopping rule
Finite-difference function approximation.
Taylor expansion:
f ( x)  f (a )  f (a )( x  a )
Gregory-Newton finite form:
1  f ( xi 1 )  f ( xi ) 
f ( x)  f ( xi )  
( x  xi )
2
xi 1  xi

1  K ( )  K ( ) 
lo
K ( )  K ( )  
(   )
hi
lo
2
 

hi
lo
lo
Finite-difference function approximation.
assumes monotonic decreasing kernel
lo
hi
K  12 K ( QR
)  K ( QR
)
NR
lo
hi

errq   K  qr   K 
K ( QR
)  K ( QR
)
2
r
errqR N R
errq
q, R :

  q :

 ( xq ) N
 ( xq )
NR
approximate {Q,R} if
K ( lo )  K ( hi )  2N  lo (Q)
Speedup Results (KDE)
N
naïve
12.5K
7
25K
31
50K
123
100K
494
200K 1976*
400K 7904*
800K 31616*
1.6M 35 hrs
dualtree
.12
.31
.46
1.0
2
5
10
23
5500x
One order-of-magnitude speedup
over single-tree at ~2M points
Ideas
1.
2.
3.
4.
Data structures and how to use them
Monte Carlo
Multipole methods
Problem/solution abstractions
These are all examples of…
Generalized N-body problems
{, arg min,  ,}
2-point: {, , I h ( ), w}
All-NN:
3-point: {, , , I H ( ), w}
KDE:
{, , K h ( ),;{h}}
SPH:
{, , K h ( ), w; t}
General theory and toolkit for
designing algorithms for
such problems
For more…
In this conference:
•
•
•
Learning trees
Monte Carlo for statistical summations
Large-scale learning workshop
•
•
•
EMST
GNP’s and MapReduce-like parallelization
Monte Carlo SVD
[email protected]

Cached sufficient statistics - Society for Industrial and

Transcript Cached sufficient statistics - Society for Industrial and

Directory