COMP 150 - Computer Science

Download Report

Transcript COMP 150 - Computer Science

UMass Lowell Computer Science 91.504
Advanced Algorithms
Computational Geometry
Prof. Karen Daniels
Spring, 2001
Lecture 8
Approximate Nearest Neighbor Searching
Derandomization for Efficient Geometric
Partitioning
Monday, 4/30/01
Part 2
Advanced Topics
Applications
Manufacturing
Modeling/Graphics
Wireless Networks
Visualization
Techniques
(de)Randomization
Approximation
Robustness
Representations
Epsilon-net
Decomposition tree
Literature for Part II
Aspect
Title
Milenkovic/Daniels Wu/Li
On
Translational
calculating
Polygon
connected
Containment
dominating
and Minimal
set for
Enclosure
efficient
using
routing in ad
Mathematical
hoc wireless
Programming
networks
Source
Journal: ITOR
Application
Areas
manufacturing
Input
Objects
2D nonconvex
polygons
Conf:
Workshop on
Discrete Alg
and Methods
for MOBILE
Computing
&
Communicati
ons
dynamic
wireless
communicati
ons
2D points
representing
hosts
Arya et al.
An optimal algorithm
for approximate nearest
neighbor searching in
fixed dimensions
Journal: ACM
knowledge discovery;
data mining; pattern
recognition;
classification; machine
learning; data
compression;
multimedia databases;
document retrieval;
statistics
d-dimensional points
Goodrich/Ramos
BoundedIndependence
Derandomization
of Geometric
Partitioning with
Applications to
Parallel FixedDimensional
Linear
Programming
Journal: Discrete
& Comp Geom
Shewchuck
Triangle: Engineering a 2D
Quality Mesh Generator and
Delaunay Triangulator
Conf: 1st Workshop on Applied
CG
linear
programming
geometric modeling; graphics
range space
PSLG of object
Literature for Part II
Aspect
Dimensional
ity
Problem/
Task
Theory?
Implementat
ion?
ADTs &
Data
Structures
Algorithmic
Paradigms
&
Techniques
Math Topics
Milenkovic/Daniels Wu/Li
2D
2D
Arya et al.
arbitrary
Goodrich/Ramos
arbitrary
Shewchuck
2D
translational
containment;
overlap
elimination;
distance-based
subdivision;
minimal
enclosure;
visibility
both
dominating
set
partitioning; nearestneighbor query
geometric
randomization;
geometric
derandomization
(constrained) Delaunay
triangulation; robustness
some
experiments
both
theory
implementation
convex hull;
visibility
polygon
undirected
graph
balanced boxdecomposition tree
epsilon-net;
epsilonapproximation
subdivision;
approximate
algorithm;
binary search
Minkowski
sum; linear
programming;
monotonicity;
convex
distance
function
distributed;
heuristic
geometric
preprocessing;
approximation
algorithm
Minkowski metric;
probability
randomization;
derandomization;
parallel
triangular mesh; (constrained)
Delaunay triangulation;
Voronoi diagram; convex hulls;
Guibas/Stolfi quad-edge;
triangular data structure; PSLG;
splay tree; heap
sweep-line; geometric divideand-conquer; incremental
insertion
graph theory:
dominating
set
VC-dimension;
linear
programming;
probability
duality
Approximate Nearest
Neighbor Searching
“An Optimal Algorithm for
Approximate Nearest Neighbor
Searching in Fixed Dimensions”
Arya, Mount, Netanyahu,
Silverman, Wu
Goals

Fast nearest neighbor query in ddimensional set of n points:
 approximate

nearest neighbor
distance within factor of (1+e) of true closest
neighbor
 preprocess
using O(dnlogn) time, O(dn) space
Balanced-Box Decomposition (BBD) tree
 note that space, time are indepenent of e

 query
in O(cd,e logn) time
C++ code for simplified version is at http://www.cs.umd.edu/~mount/ANN
Approach: Distance Assumptions

Use Lp (also called Minkowski) metric


assume it can be computed in O(d) time
pth root need not be computed when comparing
distances
 d
p
d p (q1 , q2 )    x j (q1 )  x j (q2 ) 
 j 1


Approximate nearest neighbor


1
p
distance within factor of (1+e) of true closest
neighbor p* dist ( p, q)  (1  e )dist ( p*, q)
Can change e or metric without rebuilding
data structure
Approach: Overview

Preprocess points to create:
 Balanced-Box

Decomposition (BBD) tree
Query algorithm: for query point q
 Locate leaf cell
 Priority search:
containing q in O(log n) time
Enumerate leaf cells in increasing
distance order from q




For each leaf cell, calculate distance from q to cell’s point
Keep track of closest point p seen so far
Stop when distance from q to leaf > dist(q,p)/(1+e)
Return p as approximate nearest neighbor to q.
Balanced Box Decomposition
x x
x
(BBD) Tree
y
4
3
2
3
y2

Similar to kd-tree [Samet handout]
tree
 Tree structure stored in main
memory
 Cutting planes orthogonal to axes
y1
 Binary

<
n) height
 Subdivides space into regions of
O(d) complexity using ddimensional rectangles
Can be built in O(dn log n) time
y1
>=
x1
x1
“Alternating” dimensions
 O(log

x1
7
y2
y2
2
5
x2
8
x3
1
y3
One possible
kd-like tree
for the above
points
(not a BBD
tree, though)
x4
4
9
3
6
Balanced Box Decomposition
(BBD) Tree (continued)

Distinguishing features of BBD tree:

Cell is either



d-dimensional rectangle or
difference of 2 d-dimensional nested rectangles
In this sense, BBD tree is like:

Optimized kd-tree: partition points into roughly =
sized sets [inner box shrink]



tree
split
While descending in tree, number of points on path
decreases exponentially
Specialized Quadtree: aspect ratio of box is
bounded by a constant [hyperplane split]


subdivisio
While descending in tree, size of region on path
decreases exponentially
Leaf may be associated with more than 1
point in/on cell: O(n) node
Inner boxes are “sticky”: if it is close to edge,
it “sticks”
shrink
Midpoint Algorithm
for Splitting/ Shrinking
single-stage simplified shrink


Split box b using
hyperplane through center
of b and orthogonal to ith
coordinate axis (longest
dimension)
Bounds aspect ratio
what’s wrong with this approach?

Centroid shrink:
produce O(1)
subcells, each with
<= 2nc/3 points
[nc=# pts in current cell]

3-stage: shrink,
split, shrink
3-stage shrink, split, shrink
Middle-Interval Algorithm
for Splitting/ Shrinking


Flexibility for splitting
plane choice
Choose plane from a
central strip of current
outer box
Packing Constraint

Each subdivision cell satisfies this packing constraint:
Given a BBD-tree for a set of data points in Rd, the number
of leaf cells of size at least s>0 intersecting a (Minkowski Lm)
open ball of radius r>0 is at most 1  6r 
d


s 
Proof has 2 cases:


Overlapping boxes
Disjoint boxes:


Box of side 2r encloses ball of radius r
Aspect ratio 3:1 implies smallest side length >= s/3
Densest packing given by regular grid of boxes of side length s/3
6r  intervals

Interval of length 2r can intersect no more than
1


s 
Account for all dimensions by raising to power d 



Priority Search from Query Point


Visit boxes in increasing order of distance from q
Similar to kd-tree priority search



Maintain priority queue of tree nodes
Node priority inversely related to dist(q,cell)
At start, root + v1, v2 , v3 , v4 are in priority queue
Search repeats:


Extract highest priority node
Descend subtree


visit leaf closest to q
add siblings to queue
node closest to query point
Incremental, Relative Distance
[Arya, Mount93]


Maintain sum of appropriate powers of coordinate
differences between query point and nearest point
of outer box
Incrementally update distance from parent box to
each child when split is performed:



Closer child has same distance as parent
Further child’s distance needs only 1-coordinate update
(along splitting dimension)
Can make a difference in higher dimensions!
xL
L1
distance
xR
yT
xL
xM
xR
yT
(xR - x3 , yT - y3 )
(xL + x1 , yT - y1 )
(xR - x3 , yT - y3 )
(xM - x’1 , yT - y1 )
(xL + x2 , yB + y2 )
(xR - x4 , yB + y4 )
(xL + x2 , yB + y2 )
(xM + x’4 , yB + y4 )
yB
yB
Experiments
Experiments generated points from a variety of probability
distributions:
Uniform
Gaussian
Correlated Laplacian
Laplace
Clustered Gaussian
Correlated Gaussian
Clustered Segments
Experiments
Experiments
Experiments
Conclusions

Algorithm is not necessarily practical for large
dimensions


Shrinking helps with highly clustered datasets, but
was not often needed in their experiments


But, for dimensions <= ~20, does well
Only needed for 5-20% of tree nodes
BBD tree (in paper’s form) is primarily for static
point set

But, auxiliary data structure could maintain changes
Derandomization for
Efficient Geometric
Partitioning
“Bounded-Independence Derandomization of
Geometric Partitioning with Applications to
Parallel Fixed-Dimensional Linear
Programming”
Goodrich, Ramos
Overview

Paper concerns geometric partitioning:
 Given:


a collection X of n hyperplanes in Rd
a parameter r
 Partitioning


Goal:
partition Rd into O(rd ) constant-sized cells
so that each cell intersects few hyperplanes
 Previous
Work:
Random sampling -> partition in which each cell intersects at
most en hyperplanes , where e=logr/r
 Derandomization can be used for deterministic construction

 Current
Work:


Assume set is a special space with a special property
For such a set, construct (efficiently, deterministically, and in
parallel) a (small-sized) approximation for the space

Apply to efficiently & deterministically solve parallel fixed-dimensional
linear programming
For other Goodrich papers, see http://www.cs.jhu.edu/~goodrich/cgc/pubs/
Background: Derandomization

Common approach for randomized geometric algorithms:


Derandomize:



use small-sized random samples
quantify combinatorial properties of the random samples
show that sets with these properties can be constructed efficiently
without randomization
Combinatorial properties often characterized by what the
next long series of slides is about….
Background: Configuration



Given an abstract set (universe) N of geometric objects
A configuration s over N is a pair (D,L) = (D(s),L(s)),
where D, L are disjoint subsets of N
Objects in D are:




d(s) = cardinality of D(s) = degree
Objects in L are:



triggers associated with s
objects that define s
stoppers associated with s
objects that conflict with s
l(s) = cardinality of L(s) = level = (absolute) conflict size
Source: “Computational Geometry: An Introduction
Through Randomized Algorithms” by Ketan Mulmuley
Background: Configuration


N  {h1, h2 , h3 , h4 } set of line segments in the plane
s is feasible if s occurs in trapezoidal decomposition H(R)
for some subset R of N


Example
trapezoids arising in incremental computation of H(N)
Here, R = {h3 , h4 }
h1
h1
h2
h4
N
h3
h2
s
h4
H(R)
segments in R
segments in N \ R
Source: “Computational Geometry: An Introduction
Through Randomized Algorithms” by Ketan Mulmuley
h3
Background: Configuration

Example
For a feasible trapezoid s define its:


trigger set D(s) = segments of N adjacent to boundary of s
conflict set L(s) = segments of N \ D(s) intersecting s
Configuration (D(s), L(s)) where D(s)={h3, h4} and L(s)={h1, h2}
h1
h2
h3
s
h4
H(R)
segments in R
segments in N \ R
Source: “Computational Geometry: An Introduction
Through Randomized Algorithms” by Ketan Mulmuley
Background: Configuration Space

A configuration space P(N) over N is a (multi)set of
configurations with the

Bounded Degree Property:

The degree of each configuration in P(N) is bounded
(by a constant -- something independent of N)
Note: The term configuration space is also used in motion planning.
In that context, is refers to the motion planning search space.
Source: “Computational Geometry: An Introduction
Through Randomized Algorithms” by Ketan Mulmuley
Background: Configuration


Associate with each feasible s a configuration (D(s), L(s))
If N in general position, d(s) = cardinality of D(s) <= 4


Example
since s is a trapezoid
Due to bounded degree d(s) , result P(N) is a configuration
space of all feasible trapezoids over N
configurations for feasible trapezoids s1 and s2
D(s1)={h3, h4}
L(s1)={h1, h2}
h1
s1
h2
h3
D(s2)={h3, h4}
L(s2)={0}
s2
h4
H(R)
segments in R
segments in N \ R
Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
Background: Configuration

If we restrict N to be {h3 , h4 }, then

s1 , s2 are 2 feasible trapezoids

D(s1)= D(s2)={h3, h4}

L(s1)= L(s2)={0}

2 “distinct” configurations:



Example
(D(s1), L(s1)) = (D(s2), L(s2))
h3
s1
s2
h4
2 feasible trapezoids for N = {h3 , h4 }
Size of P(N) includes such “duplicate” configurations
Reduced Size of P(N) excludes “duplicates”
Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
Background: Configuration


Example
Note that not every arrangement of line segments (before
overlaying a trapezoidal decomposition on it) has the
bounded degree property.
In general, it can have d(s) = O(N)
h1
h2
s
h3
h5
h4

Can you think of another type of decomposition that has
bounded degree?
Source: “Computational Geometry: An Introduction
Through Randomized Algorithms” by Ketan Mulmuley
Background: Configuration

Example
Definition: Pi(N) is set of configurations in P(N) with level i

[recall level is size of L(s), the conflict set]

P0(N) is active over N.

Example: P0(N) = {(D(s1), L(s1)), (D(s2), L(s2))}
h3
s1
s2
h4
2 feasible trapezoids for N = {h3 , h4 }
Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
Background: Configuration


Example
Definition: A configuration space P(N) has bounded
valence if the number of configurations in P(N) sharing the
same trigger set is bounded (by a constant).
Example: For P(N) = our configuration space of all
feasible trapezoids over N has bounded valence


all feasible trapezoids with same trigger set can be identified with trapezoids in
trapezoidal decomposition formed by that trigger set
size of that trigger set is bounded by a constant, so number of such trapezoids is
also bounded by a constant
s1
h3
s2
h4
Trapezoidal decomposition induced by trigger set = {h3, h4}
Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
Background: Configuration

Theorem:

Let:





P(N) be a configuration space of bounded valence
n=size of N
d = maximum degree of a configuration in P(N)
R = a random sample of N of size r
Then:

For each active configuration s in P0(R)




Example
with probability > 1/2
the conflict size of s relative to N is <= c(n/r) log r for large enough c
Expected reduced size: E[reduced size of P(R)] is in O(rd)
Example: For any random sample R of N of size r


each trapezoid in the trapezoidal decomposition H(R) has O([n/r]
log r) conflict size with high probability
size of P(R) is in O(rd)

for bounded P(N) size and reduced size only differ by constant factor
Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
Background: Range Space

Definition:

Let: P(N) be a configuration space

n=size of N
p’(r) be maximum reduced size function of P(N) for r <= n

In this case, d is the dimension of P(N)




P(N) has bounded dimension if there is a constant d such that p’(r)
is in O(rd) for all r <= n
Bounded valence -> bounded dimension
Some important types of configuration spaces don’t have bounded
valence but have bounded dimension
 Range space: configuration space for which trigger set of every
configuration is empty. In this case, a configuration is a range.
 Half-space Range: Range = points in halfspace.
P(N)  set of distinct ranges induced by (upper)
halfspaces. Dualize -> line arrangement [it has
bounded dimension]
Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
Background: e-net of a Range Space


Theorem:
If:



P(N) is a configuration space of bounded dimension d
e >= 0
R is a random subset of N



then:

conflict size of each configuration s in P0(R) in (relative to N) of
every range in P0(R) < e n


formed via r independent draws from N with replacement
r >= 8/e
with probability at least 1- 2p’(2r) 2-er/2
For a range space P(N)


R is an e-net of the range space P(N)
for large enough r, a random sample of size r is an e-net with high
probability
Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
Background: VC-Dimension
of a Range Space



Use to bound dimension when direct argument fails
Let P(N) be a range space
A subset M of N is shattered by N if every subset of M
occurs as a range in P(M)


Reduced size of P(M) is 2m
VC-Dimension of P(N) is maximum size of a shattered
subset of N.
Example:
h3
h2
h1
p1
p2
p3
h4
h5
p4 p5
h6
h7
p6
h8
p7
N=set of 1D points
P(N)=space of ranges induced by rightwards half-spaces
What is the VC-Dimension of P(N)?
Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
What does all this have to do
with the paper????

Paper concerns geometric partitioning:
 Given:


a collection X of n hyperplanes in Rd
a parameter r
 Goal:


partition Rd into O(rd ) constant-sized cells
so that each cell intersects few hyperplanes
 Previous
Work:
Random sampling -> partition in which each cell intersects at
most en hyperplanes , where e=logr/r
 Derandomization can be used for deterministic construction

 Current

Assume set is a range space with bounded VC-exponent


Work:
VC-exponent is more general concept than VC-dimension
For such a set, construct (efficiently, deterministicaly, and in
parallel) a (small-sized) approximation for the range space that
is a variation on the e-net concept.
Additional Handouts

Parallel programming
 PRAM
CREW, EREW models
 Parallel geometric algorithms
Project Update
Project
Deliverable Due Date
Proposal
Interim Report
Final Presentation
Final Submission
Grade %
Monday, 4/9
Monday, 4/23
Monday, 5/7
Monday, 5/14
25% of course grade
2%
5%
8%
10%
Guidelines: Presentation




1/2 hour class presentation
Explain to the class what you did
Structure it any way you like!
Some ideas:



slides (electronic or transparency)
demo
handouts
Guidelines: Final Submission


Abstract: Concise overview (at most 1 page)
Introduction:





Motivation: Why did you choose this project?
Related Work: Context with respect to CG literature
Summary of Results
Main Body of Paper: (one or more sections)
Conclusion:
Summary: What did you accomplish?
 Future Work: What would you do if you had more time?
 References: Bibliography (papers, books that you used)

Well- written final submissions with research content may be
eligible for publishing as UMass Lowell CS technical reports.
Guidelines: Final Submission

Main Body of Paper:
 If
your project involves Theory/ Algorithm:
Informal algorithm description (& example)
 Pseudocode
 Analysis:


Correctness



Solutions generated by algorithm are correct
account for degenerate/boundary/special cases

If a correct solution exists, algorithm finds it

Control structures (loops, recursions,...) terminate correctly
Asymptotic Running Time and/or Space Usage
Guidelines: Final Submission

Main Body of Paper:
 If
your project involves Implementation:
Informal description
 Resources & Environment:



what language did you code in?

what existing code did you use? (software libraries, etc.)

what equipment did you use? (machine, OS, compiler)
Assumptions


parameter values
Test cases

tables, figures

representative examples
Final Exam
Final Exam: Date, Format


Date Choices
 Friday,
18 May at
1:00-4:00 pm or
 5:30-8:30 pm

 Wednesday,
23 May at
9:00-12:00 am or
 1:00-4:00 pm or
 5:30-8:30 pm or

Format:
 in class
 open book, notes
 similar to midterm:
 50% calculate/
manipulate
 50% design,
analyze
Final Exam: Part I Material


O’Rourke CH 1-8: emphasis on chapters omitted from midterm (CH 7-8)
Some key themes

Common geometric/combinatorial structures:
 Decomposition/Partition:
 Triangulation
 Trapezoidalization
 Delaunay Triangulation
 Voronoi Diagram
 Arrangment (level, zone)
 Enclosure:
 Convex Hull
 Nested Polytope Hierarchy
 Visibility Polygon & Kernel of Star Polygon
Final Exam: Part I Material

Some key themes (continued)



Algorithmic Paradigms
 Sweep: sort, then sweep a line, parabolic front
 Divide-and-Conquer
 Incremental
 Randomized
 Output-Sensitive
 Preprocessing for fast queries
Representations:
 Quad-edge
 O’Rourke
Geometric Primitives
Final Exam: Part I Material

Some key themes (continued)

Math:
 Convexity
 Monotonicity
 Distance Metrics
 Visibility/ Star-shapedness
 Euler’s Formula
 Duality
 Graphs
 Point <-> Line
 Parabolic
 Minkowski Sum
 Randomness
 Graph Theory: Independent Set
Final Exam: Part II Material

Part II
 Translational Polygon Containment
 Connected Dominating Sets for Wireless Networks
 Mesh Generation using Delaunay Triangulation
 Approximate Nearest Neighbor Searching
 Derandomization for Efficient Geometric Partitioning
Final Exam: Part II Material

Translational Polygon Containment
 Minkowski Sum properties & use for intersection
 Distance metrics: Lp, convex
 Linear programming: Basic model formulation




Linear equations representing constraints & objective function
Convex feasible region
Types of variables
Connected Dominating Sets for Wireless Networks
 Dominating set of an undirected graph
 Basic algorithm + 2 rules
Final Exam: Part II Material


Mesh Generation using Delaunay Triangulation
 Incremental Delaunay Triangulation Algorithm
 Edge flipping
 Constrained Delaunay Triangulation
 Mesh refinement
 Robustness: awareness of general issues
Derandomization for Efficient Geometric Partitioning
 Configurations, configuration spaces, conflict and
trigger sets
Final Exam: Part II Material

Approximate Nearest Neighbor Searching
 Distance computations for high dimensions:
 approximate distance & stopping criterion
 compare without taking roots
 incremental for subdivision
 Binary Box Decomposition (“kd-like”) tree
representation for fast ~O(log n) queries:
 Efficiency by eliminating at each step/level a
constant fraction of:
 Area (geometric consideration)
 Vertices (combinatorial consideration)