Document

Transcript Document

Data Structures & Algorithms
Union-Find Example
Richard Newman
Steps to Develop an Algorithm







Define the problem – model it
Determine constraints
Find or create an algorithm to solve it
Evaluate algorithm – speed, space, etc.
If algorithm isn’t satisfactory, why not?
Try to fix algorithm
Iterate until solution found (or give up)
Dynamic Connectivity Problem
• Given a set of N elements
• Support two operations:
•
•
Connect two elements
Given two elements, is there a path between
them?
Example
Connect (4, 3)
Connect (3, 8)
0
1
Connect (6, 5)
5
6
Connect (9, 4)
Connect (2, 1)
Are 0 and 7 connected (No)
Are 8 and 9 connected (Yes)
2
3
4
7
8
9
Example (con’t)
Connect (5, 0)
Connect (7, 2)
0
1
2
3
Connect (6, 1)
5
6
7
8
Connect (1, 0)
Are 0 and 7 connected (Yes)
Now consider a problem with 10,000
elements and 15,000 connections….
4
9
Modeling the Elements
Various interpretations of the elements:
 Pixels in a digital photo
 Computers in a network
 Socket pins on a PC board
 Transistors in a VLSI design
 Variable names in a C++ program
 Locations on a map
 Friends in a social network
 …
Convenient to just number 0 to N-1
Use as array index, suppress details
Modeling the Connections
Assume “is connected to” is an equivalence relation
 Reflexive: a is connected to a
 Symmetric: if a is connected to b, then
b is connected to a
 Transitive: if a is connected to b, and
b is connected to c, then
a is connected to c
Connected Components

A connected component is a maximal
set of elements that are mutually
connected (i.e., an equivalence set)
0
1
2
3
4
5
6
7
8
9
{0} {1,2} {3,4,8,9} {5,6} {7}
Implementing the Operations
Recall – connect two elements, and answer if two
elements have a path between them
 Find: in which component is element a?
 Union: replace components containging elements
a and b with their union
 Connected: are elements a and b in the same
component?
Example
0
1
2
3
4
5
6
7
8
9
{0} {1,2} {3,4,8,9} {5,6} {7}
Union(1,6)
Components?
0
1
2
3
4
5
6
7
8
9
{0} {1,2,5,6} {3,4,8,9} {7}
Union-Find Data Type
Goal: Design an efficient data structure for union-find
 Number of elements can be huge
 Number of operations can be huge
 Union and find operations can be intermixed
public class UF
UF int(N);
void
union(int a, int b);
int
find(int a);
boolean connected(int a, int b);
Dynamic Connectivity Client


Read in number of elements N from stdin
Repeat:
– Read in pair of integers from stdin
– If not yet connected, connect them and print out pair
read input int N
while stdin is not empty
read in pair of ints a and b
if not connected (a, b)
union(a, b)
print out a and b
Quick-Find

Data Structure
– Integer array id[] of length N
– Interpretation: id[a] is the id of the component containing a
i: 0 1 2 3 4 5 6 7 8 9
id[i]: 0 1 1 4 4 5 5 7 4 4
0
1
2
3
4
5
6
7
8
9
Quick-Find

Data Structure
– Integer array id[] of length N
– Interpretation: id[a] is the id of the component containing a



Find: what is the id of a?
Connected: do a and b have the same id?
Union: Change all the entries in id that have
the same id as a to be the id of b.
Quick-Find
i: 0 1 2 3 4 5 6 7 8 9
id: 0 1 1 4 4 5 5 7 4 4
0
1
2
3
4
Union(1,6)
5
6
7
8
9
i: 0 1 2 3 4 5 6 7 8 9
id: 0 5 5 4 4 5 5 7 4 4
It works – so is there a problem?
Well, there may be many values to
change, and many to search!
Quick-Find

Quick-Find operation times
–
–
–
–

Initialization takes time O(N)
Union takes time O(N)
Find takes time O(1)
Connected takes time O(1)
Union is too slow – it takes O(N2) array
accesses to process N union operations
on N elements
Quadratic Algos Do Not Scale!

Rough Standards (for now)
– 109 operations per second
– 109 words of memory
– Touch all words in 1 second (+/- truism since 1950!)

Huge problem for Quick-Find:




109 union commands on 109 elements
Takes more than 1018 operations
This is 30+ years of computer time!
Quadratic Algos Do Not Scale!

They do not keep pace with technology





New computer may be 10x as fast
But it has 10x as much memory
Want to solve problems 10x as big
With quadratic algorithm, it takes…
… 10 x as long!!!
Quick-Union

Data Structure
– Integer array id[] of length N
– Interpretation: id[a] is the parent of a
– Component is root of a = id[id[…id[a]…]] (fixed point)
i: 0 1 2 3 4 5 6 7 8 9
id[i]: 0 1 1 3 3 5 5 7 3 4
0
1
2
3
8
5
4
6
9
7
Quick-Union

Data Structure
– Find: What is root of tree of a?
– Connected: Do a and b have the same root?
– Union: Set id of root of b’s tree to be root of a’s tree
i: 0 1 2 3 4 5 6 7 8 9
id[i]: 0 1 1 3 3 5 5 7 3 4
0
1
2
3
8
5
4
6
9
7
Quick-Union
– Find 9
– Connected 8, 9:
– Union 7,5
i: 0 1 2 3 4 5 6 7 8 9
id[i]: 0 1 1 3 3 5 5 5
7 3 4
0
1
2
3
8
5
4
6
9
Only ONE value changes! = FAST
7
Quick-Union

Quick-Union operation times (worst case)
–
–
–
–

Initialization takes time O(N)
Union takes time O(N) (must find two roots)
Find takes time O(N)
Connected takes time O(N)
Now union AND find are too slow – it
takes O(N2) array accesses to process N
operations on N elements
Quick-Find/Quick-Union


Observations:
Problem with Quick-Find is unions
– May take N array accesses
– Trees are flat, but too expensive to keep them flat!

Problem with Quick-Union
– Trees may get tall
– Find (and hence, connected and union) may take N
array accesses
Weighted Quick-Union



Make Quick-Union trees stay short!
Keep track of tree size
Join smaller tree into larger tree
– May alternatively do union by height/rank
– Need to keep track of “weight”
Quick-Union
may do this b
a
But we always
want this
b
a
Weighted Quick-Union

Weighted Quick-Union operation times
–
–
–
–

Initialization takes time O(N)
Union takes time O(1) (given roots)
Find takes time O(depth of a)
Connected takes time O(max {depth of a, b})
Proposition:
Depth of any node x is at most lg N
Pf: What causes depth of x to increase?
Weighted Quick-Union

Proposition:
Depth of any node x is at most lg N
Pf: What causes depth of x to increase?
Only union! And if x is in smaller tree.
So x’s tree must at least double in size each
time union increases x’s depth
Which can happen at most lg N times.
(Why?)
WQU with Path Compression

After performing find
– Set parent of all nodes along path to root
– Time order is the same for the find (just
traverse twice)

One-pass Variant
– Set every other node’s parent to it’s
grandparent

No reason NOT to do this – other than a
bit of laziness
– Huge benefits – tree is almost flat!
WQU with Path Compression




Theorem: Starting from an empty data
structure, any sequence of M union and
find operations on N elements take
O(N+M lg* N) time
Proof: Difficult! (lg* is number of times you
have to take log to get to 1)
Performance: lg* is almost constant!
And, in theory, no linear time algorithm
exists!
Lg* Function

N
1
2
4
16
64K
264K
Lg* N
0
1
2
3
4
5
Performance: lg* is almost constant!
UF Summary
Algorithm
Worst-Case Time
Quick-Find
MN
Quick-Union
MN
Weighted Quick-Union
N + M lg N
Path Compression
N + M lg N
WQU/PC
N + M lg* N
Example



Huge problem: 1 billion nodes, 10 billion
edges
WQU/PC reduces time from 3000 years to
1 minute!
Faster computer won’t help!
– WQU/PC on cell phone in Java beats QF on
supercomputer!

Better algorithm will!
Applications

Percolation
– N by N grid, each space vacant or accupied
– Grid percolates if top is connected to bottom
by vacant spaces
– For large N, vacancy percentage for
percolation is about 0.6, known by simulation

Models
– Electrical systems
– Fluid flow
– Social networks
Next – Lecture 3




Read Chapter 2
Empirical analysis
Asymptotic analysis of algorithms
Basic recurrences

Document

Transcript Document

Directory