CSE 326: Data Structures Lecture #20 Multidimensional Search Trees

Download Report

Transcript CSE 326: Data Structures Lecture #20 Multidimensional Search Trees

CSE 326: Data Structures
Part 10
Advanced Data Structures
Henry Kautz
Autumn Quarter 2002
1
Outline
• Multidimensional search trees
– Range Queries
– k-D Trees
– Quad Trees
• Randomized Data Structures & Algorithms
– Treaps
– Primality testing
– Local search for NP-complete problems
2
Multi-D Search ADT
• Dictionary operations
–
–
–
–
–
–
create
destroy
find
insert
delete
range queries
5,2
2,5
4,4
4,2
8,4
1,9
3,6
8,2
5,7
9,1
• Each item has k keys for a k-dimensional search tree
• Searches can be performed on one, some, or all the
keys or on ranges of the keys
3
Applications of Multi-D Search
•
•
•
•
•
•
•
•
Astronomy (simulation of galaxies) - 3 dimensions
Protein folding in molecular biology - 3 dimensions
Lossy data compression - 4 to 64 dimensions
Image processing - 2 dimensions
Graphics - 2 or 3 dimensions
Animation - 3 to 4 dimensions
Geographical databases - 2 or 3 dimensions
Web searching - 200 or more dimensions
4
Range Query
A range query is a search in a dictionary in which
the exact key may not be entirely specified.
Range queries are the primary interface
with multi-D data structures.
5
Range Query Examples:
Two Dimensions
• Search for items based on
just one key
• Search for items based on
ranges for all keys
• Search for items based on
a function of several keys:
e.g., a circular range
query
6
Range Querying in 1-D
Find everything in the rectangle…
x
7
Range Querying in 1-D with a BST
Find everything in the rectangle…
x
8
1-D Range Querying in 2-D
y
9
x
2-D Range Querying in 2-D
y
10
x
k-D Trees
• Split on the next dimension at each succeeding level
• If building in batch, choose the median along the
current dimension at each level
– guarantees logarithmic height and balanced tree
• In general, add as in a BST
k-D tree node
keys value
dimension
The dimension that
this node splits on
left right
11
Find in a k-D Tree
find(<x1,x2, …, xk>, root) finds the node
which has the given set of keys in it or returns
null if there is no such node
Node find(keyVector keys,
Node root) {
int dim = root.dimension;
if (root == NULL)
return NULL;
else if (root.keys == keys)
return root;
else if (keys[dim] < root.keys[dim])
return find(keys, root.left);
else
return find(keys, root.right);
}
runtime:
12
Find Example
5,2
find(<3,6>)
find(<0,10>)
2,5
4,4
8,4
1,9
4,2
8,2
3,6
5,7
9,1
13
Building a 2-D Tree (1/4)
y
14
x
Building a 2-D Tree (2/4)
y
15
x
Building a 2-D Tree (3/4)
y
16
x
Building a 2-D Tree (4/4)
y
17
x
k-D Tree
a
b
d
c
e
e
f
j
g
k
h
i
l
g
m
f
b
h
k
a j
d
c
l
i
m18
2-D Range Querying in 2-D Trees
y
x
Search every partition that intersects the rectangle.
Check whether each node (including leaves) falls into the range.
19
Range Query in a 2-D Tree
print_range(int xlow, xhigh, ylow, yhigh, Node root) {
if (root == NULL) return;
if ( xlow <= root.x && root.x <= xhigh &&
ylow <= root.y && root.y <= yhigh ){
print(root);
if ((root.dim == “x” && xlow <= root.x ) ||
(root.dim == “y” && ylow <= root.y ))
print_range(root.left);
if ((root.dim == “x” && root.x <= xhigh) ||
(root.dim == “y” && root.y <= yhigh)
print_range(root.right);
}
runtime: O(N)
20
Range Query in a k-D Tree
print_range(int low[MAXD], high[MAXD], Node root) {
if (root == NULL) return;
inrange = true;
for (i=0; i<MAXD;i++){
if ( root.coord[i] < low[i] ) inrange = false;
if ( high[i] < root.coord[i] ) inrange = false; }
if (inrange) print(root);
if ((low[root.dim] <= root.coord[root.dim] )
print_range(root.left);
if (root.coord[root.dim] <= high[root.dim])
print_range(root.right);
}
runtime: O(N)
21
Other Shapes for Range Querying
y
x
Search every partition that intersects the shape (circle).
Check whether each node (including leaves) falls into the shape.
22
k-D Trees Can Be Inefficient
(but not when built in batch!)
insert(<5,0>)
insert(<6,9>)
insert(<9,3>)
insert(<6,5>)
insert(<7,7>)
insert(<8,6>)
5,0
6,9
9,3
6,5
7,7
suck factor:
8,6
23
k-D Trees Can Be Inefficient
(but not when built in batch!)
insert(<5,0>)
insert(<6,9>)
insert(<9,3>)
insert(<6,5>)
insert(<7,7>)
insert(<8,6>)
5,0
6,9
9,3
6,5
7,7
suck factor: O(n)
8,6
24
Quad Trees
• Split on all (two) dimensions at each level
• Split key space into equal size partitions (quadrants)
• Add a new node by adding to a leaf, and, if the leaf is
already occupied, split until only one node per leaf
quadrant
quad tree node
0,1 1,1
keys value
0,0 1,0
Center
Center:
x
y
Quadrants: 0,0 1,0 0,1 1,1
25
Find in a Quad Tree
find(<x, y>, root) finds the node which has the
given pair of keys in it or returns quadrant where
the point should be if there is no such node
Node find(Key x, Key y, Node root) {
if (root == NULL)
return NULL;
// Empty tree
if (root.isLeaf())
Compares against
return root;
// Key may not actually be here
center; always
makes the same
choice on ties.
int quad = getQuadrant(x, y, root);
return find(x, y, root.quadrants[quad]);
}
runtime: O(depth)
26
Find Example
find(<10,2>) (i.e., c)
find(<5,6>) (i.e., d)
a
c
b
a
g
d
e
f
d
e
f
g
b
c
27
Building a Quad Tree (1/5)
y
28
x
Building a Quad Tree (2/5)
y
29
x
Building a Quad Tree (3/5)
y
30
x
Building a Quad Tree (4/5)
y
31
x
Building a Quad Tree (5/5)
y
32
x
Quad Tree Example
a
c
b
a
g
d
e
f
d
e
f
g
b
c
33
Quad Trees Can Suck
a
b
suck factor:
34
Quad Trees Can Suck
a
b
suck factor: O(log (1/minimum distance between nodes))
35
2-D Range Querying in Quad Trees
y
36
x
2-D Range Query in a Quad Tree
print_range(int xlow, xhigh, ylow, yhigh, Node root){
if (root == NULL) return;
if ( xlow <= root.x && root.x <= xhigh &&
ylow <= root.y && root.y <= yhigh ){
print(root);
if (xlow <= root.x && ylow <= root.y)
print_range(root.lower_left);
if (xlow <= root.x && root.y <= yhigh)
print_range(root.upper_left);
if (root.x <= x.high && ylow <= root.x)
print_range(root.lower_right);
if (root.x <= xhigh && root.y <= yhigh)
print_range(root.upper_right);
}
runtime: O(N)
37
Find in a Quad Tree
find(<x, y>, root) finds the node which has the
given pair of keys in it or returns quadrant where
the point should be if there is no such node
Node find(Key x, Key y, Node root) {
if (root == NULL)
return NULL;
// Empty tree
if (root.isLeaf())
Compares against
return root;
// Key may not actually be here
center; always
makes the same
choice on ties.
int quad = getQuadrant(x, y, root);
return find(x, y, root.quadrants[quad]);
}
runtime: O(depth)
38
Delete Example
delete(<10,2>)(i.e., c)
a
c
b
a
g
d
e
f
• Find and delete the node.
• If its parent has just one
child, delete it.
• Propagate!
d
g
b
e
f
c
39
Nearest Neighbor Search
getNearestNeighbor(<1,4>)
a
c
b
a
g
d
e
f
g
• Find a nearby node (do a find).
b
c
• Do a circular range query.
• As you get results, tighten the circle.
• Continue until no closer node in query.
d
e
f
Works on
40
k-D Trees, too!
Quad Trees vs. k-D Trees
• k-D Trees
–
–
–
–
Density balanced trees
Number of nodes is O(n) where n is the number of points
Height of the tree is O(log n) with batch insertion
Supports insert, find, nearest neighbor, range queries
• Quad Trees
– Number of nodes is O(n(1+ log(/n))) where n is the number of
points and  is the ratio of the width (or height) of the key
space and the smallest distance between two points
– Height of the tree is O(log n + log )
– Supports insert, delete, find, nearest neighbor, range queries
41
To Do
• Read (a little) about k-D trees in Weiss 12.6
42
CSE 326: Data Structures
Part 10, continued
Data Structures
Henry Kautz
Autumn Quarter 2002
43
Pick a Card
Warning! The Queen of Spades
is a very unlucky card!
44
Randomized Data Structures
• We’ve seen many data structures with good
average case performance on random inputs, but
bad behavior on particular inputs
– Binary Search Trees
• Instead of randomizing the input (since we
cannot!), consider randomizing the data structure
– No bad inputs, just unlucky random numbers
– Expected case good behavior on any input
45
What’s the Difference?
• Deterministic with good average time
– If your application happens to always use the “bad” case,
you are in big trouble!
• Randomized with good expected time
– Once in a while you will have an expensive operation, but
no inputs can make this happen all the time
• Kind of like an
insurance policy
for your algorithm!
46
Treap Dictionary Data Structure
• Treaps have the binary
search tree
– binary tree property
– search tree property
• Treaps also have the
heap-order property!
– randomly assigned
priorities
heap in yellow; search tree in blue
2
9
6
7
4
18
7
8
9
15
10
30
Legend:
priority
key
15
12
47
Treap Insert
• Choose a random priority
• Insert as in normal BST
• Rotate up until heap order is restored (maintaining
BST property while rotating)
insert(15)
2
9
6
7
14
12
7
8
2
9
6
7
2
9
14
12
7
8
6
7
9
15
9
15
7
8
14
12
48
Tree + Heap… Why Bother?
Insert data in sorted order into a treap; what shape
tree comes out?
insert(7)
insert(8)
insert(9)
insert(12)
6
7
6
7
2
9
2
9
7
8
6
7
6
7
15
12
Legend:
priority
key
7
8
7
8
49
Treap Delete
•
•
•
•
delete(9)
Find the key
2
rotate left
9
Increase its value to 
6
9
Rotate it to the fringe
7
15
Snip it off
6
7
rotate right
7
8
15
12
6
7
rotate left

9
7
8
9
15
15
12
7
8

9
15
12
9
15
50
Treap Delete, cont.
6
7
rotate right
6
7
rotate right
6
7
7
8
7
8

9
15
12
7
8
9
15
9
15
9
15

9
15
12
15
12

9
snip!
51
Treap Summary
• Implements Dictionary ADT
–
–
–
–
insert in expected O(log n) time
delete in expected O(log n) time
find in expected O(log n) time
but worst case O(n)
• Memory use
– O(1) per node
– about the cost of AVL trees
• Very simple to implement, little overhead – less
than AVL trees
52
Other Randomized Data
Structures & Algorithms
• Randomized skip list
– cross between a linked list and a binary search tree
– O(log n) expected time for finds, and then can simply
follow links to do range queries
• Randomized QuickSort
– just choose pivot position randomly
– expected O(n log n) time for any input
53
Randomized Primality Testing
•
No known polynomial time algorithm for primality
testing
– but does not appear to be NP-complete either – in
between?
•
Best known algorithm:
1. Guess a random number 0 < A < N
2. If (AN-1 % N)  1, then N is not prime
3. Otherwise, 75% chance N is prime
– or is a “Carmichael number” – a slightly more complex test
rules out this case
4. Repeat to increase confidence in the answer
54
Randomized Search Algorithms
• Finding a goal node in very, very large graphs
using DFS, BFS, and even A* (using known
heuristic functions) is often too slow
• Alternative: random walk through the graph
55
N-Queens Problem
• Place N queens on an N by N chessboard so that
no two queens can attack each other
• Graph search formulation:
– Each way of placing from 0 to N queens on the
chessboard is a vertex
– Edge between vertices that differ by adding or removing
one queen
– Start vertex: empty board
– Goal vertex: any one with N non-attacking queens (there
are many such goals)
• Demo
56
Random Walk – Complexity?
• Random walk – also known as an “absorbing
Markov chain”, “simulated annealing”, the
“Metropolis algorithm” (Metropolis 1958)
• Can often prove that if you run long enough will
reach a goal state – but may take exponential time
• In some cases can prove that with high probability a
goal is reached in polynomial time
– e.g., 2-SAT, Papadimitriou 1997
• Widely used for real-world problems where actual
complexity is unknown – scheduling, optimization
57
Traveling Salesman
Recall the Traveling Salesperson (TSP) Problem:
Given a fully connected, weighted graph G =
(V,E), is there a cycle that visits all vertices
exactly once and has total cost  K?
– NP-complete: reduction from Hamiltonian circuit
• Occurs in many real-world transportation and
design problems
• Randomized simulated annealing algorithm demo
58
Latin Squares
• Randomization can be combined with depth first
search
• When a branch of the search terminates without
finding a solution, algorithm backs up to the last
choice point: backtracking search
• Instead of make choice of branch to follow
systematically, make it randomly
– If your random choices are unlucky, give up and start
over again
• Demo
59
Final Review
(“We’ve covered way too much in this course…
What do I really need to know?”)
60
Be Sure to Bring
• 1 page of notes
• A hand calculator
• Several #2 pencils
61
Final Review: What you need to
know
N ( N  1)
• Basic Math
N
– Logs, exponents, summation of series
– Proof by induction
• Asymptotic Analysis
i 
i 1
2
A N 1  1
A 

A 1
i 0
N
i
– Big-oh, Theta and Omega
– Know the definitions and how to show f(N) is bigO/Theta/Omega of (g(N))
– How to estimate Running Time of code fragments
• E.g. nested “for” loops
• Recurrence Relations
– Deriving recurrence relation for run time of a recursive
function
– Solving recurrence relations by expansion to get run time
62
Final Review: What you need to
know
• Lists, Stacks, Queues
– Brush up on ADT operations – Insert/Delete, Push/Pop etc.
– Array versus pointer implementations of each data structure
– Amortized complexity of stretchy arrays
• Trees
– Definitions/Terminology: root, parent, child, height, depth
etc.
– Relationship between depth and size of tree
• Depth can be between O(log N) and O(N) for N nodes
63
Final Review: What you need to
know
• Binary Search Trees
– How to do Find, Insert, Delete
• Bad worst case performance – could take up to O(N) time
– AVL trees
• Balance factor is +1, 0, -1
• Know single and double rotations to keep tree balanced
• All operations are O(log N) worst case time
– Splay trees – good amortized performance
• A single operation may take O(N) time but in a sequence of
operations, average time per operation is O(log N)
• Every Find, Insert, Delete causes accessed node to be moved to the
root
• Know how to zig-zig, zig-zag, etc. to “bubble” node to top
64
Final Review: What you need to
know
• Priority Queues
– Binary Heaps: Insert/DeleteMin, Percolate up/down
• Array implementation
• BuildHeap takes only O(N) time (used in heapsort)
– Binomial Queues: Forest of binomial trees with heap order
• Merge is fast – O(log N) time
• Insert and DeleteMin based on Merge
• Hashing
– Hash functions based on the mod function
– Collision resolution strategies
• Chaining, Linear and Quadratic probing, Double Hashing
– Load factor of a hash table
65
Final Review: What you need to
know
• Sorting Algorithms: Know run times and how they work
– Elementary sorting algorithms and their run time
• Selection sort
– Heapsort – based on binary heaps (max-heaps)
• BuildHeap and repeated DeleteMax’s
– Mergesort – recursive divide-and-conquer, uses extra array
– Quicksort – recursive divide-and-conquer, Partition in-place
• fastest in practice, but O(N2) worst case time
• Pivot selection – median-of-three works best
– Know which of these are stable and in-place
– Lower bound on sorting, bucket sort, and radix sort
66
Final Review: What you need to
know
• Disjoint Sets and Union-Find
– Up-trees and their array-based implementation
– Know how Union-by-size and Path compression work
– No need to know run time analysis – just know the result:
• Sequence of M operations with Union-by-size and P.C. is (M
(M,N)) – just a little more than (1) amortized time per op
• Graph Algorithms
– Adjacency matrix versus adjacency list representation of
graphs
– Know how to Topological sort in O(|V| + |E|) time using a
queue
– Breadth First Search (BFS) for unweighted shortest path
67
Final Review: What you need to
know
• Graph Algorithms (cont.)
– Dijkstra’s shortest path algorithm
– Depth First Search (DFS) and Iterated DFS
• Use of memory compared to BFS
– A* - relation of g(n) and h(n)
– Minimum Spanning trees – Kruskal’s & Prim’s algorithms
– Connected components using DFS or union/find
• NP-completeness
– Euler versus Hamiltonian circuits
– Definition of P, NP, NP-complete
– How one problem can be “reduced” to another (e.g. input to HC
can be transformed into input for TSP)
68
Final Review: What you need to
know
• Multidimensional Search Trees
– k-d Trees – find and range queries
• Depth logarithmic in number of nodes
– Quad trees – find and range queries
• Depth logarithmic in inverse of minimal distance between
nodes
• But higher branching fractor means shorter depth if points are
well spread out (log base 4 instead of log base 2)
• Randomized Algorithms
– expected time vs. average time vs. amortized time
– Treaps, randomized Quicksort, primality testing
69