CSE 326 Randomized Data Structures
Download
Report
Transcript CSE 326 Randomized Data Structures
CSE 326
Randomized Data Structures
David Kaplan
Dept of Computer Science & Engineering
Autumn 2001
Motivation for
Randomized Data Structures
We’ve seen many data structures with good average
case performance on random inputs, but bad behavior
on particular inputs
e.g. Binary Search Trees
Instead of randomizing the input (since we cannot!),
consider randomizing the data structure
No bad inputs, just unlucky random numbers
Expected case good behavior on any input
Randomized Data Structures
CSE 326 Autumn 2001
2
Average vs. Expected Time
Average
Expectation
(1/N) xi
Pr(xi) xi
Deterministic with good average time
If your application happens to always (or often) use the
“bad” case, you are in big trouble!
Randomized with good expected time
Once in a while you will have an expensive operation, but no
inputs can make this happen all the time
Like an insurance policy for your algorithm!
Randomized Data Structures
CSE 326 Autumn 2001
3
Randomized Data Structures
Define a property (or subroutine) in an algorithm
Sample or randomly modify the property
Use altered property as if it were the true property
Can transform average case runtimes into expected
runtimes (remove input dependency).
Sometimes allows substantial speedup in exchange for
probabilistic unsoundness.
Randomized Data Structures
CSE 326 Autumn 2001
4
Randomization in Action
Quicksort
Randomized data structures
Treaps
Randomized skip lists
Random number generation
Primality testing
Randomized Data Structures
CSE 326 Autumn 2001
5
Treap
Dictionary Data Structure
Treap is a BST
binary tree property
search tree property
Treap is also a heap
heap-order property
random priorities
priority
key
2
9
6
7
4
18
7
8
9
15
10
30
15
12
Randomized Data Structures
CSE 326 Autumn 2001
6
Treap Insert
Choose a random priority
Insert as in normal BST
Rotate up until heap order is restored
2
9
6
7
2
9
insert(15)
14
12
7
8
Randomized Data Structures
6
7
2
9
14
12
7
8
CSE 326 Autumn 2001
6
7
9
15
9
15
7
8
14
12
7
Tree + Heap… Why Bother?
Insert data in sorted order into a treap …
What shape tree comes out?
insert(7)
insert(8)
insert(9)
insert(12)
6
7
6
7
2
9
2
9
7
8
6
7
6
7
7
8
Randomized Data Structures
CSE 326 Autumn 2001
15
12
7
8
8
Treap Delete
delete(9)
Find the key
Increase its value to
Rotate it to the fringe
Snip it off
6
7
6
7
2
9
6
7
7
8
9
15
15
12
15
12
6
7
7
8
9
15
7
8
9
15
9
15
12
15
12
9
9
7
8
9
15
6
7
7
8
9
15
15
12
Treap Summary
Implements Dictionary ADT
insert in expected O(log n) time
delete in expected O(log n) time
find in expected O(log n) time
but worst case O(n)
Memory use
O(1) per node
about the cost of AVL trees
Very simple to implement
little overhead – less than AVL trees
Randomized Data Structures
CSE 326 Autumn 2001
10
Perfect Binary Skip List
Sorted linked list
# of links of a node is its height
The height i link of each node (that has one)
links to the next node of height i or greater
22
29
23
20
13
10
CSE 326 Autumn 2001
19
11
8
2
Randomized Data Structures
11
Find in a Perfect Binary Skip List
Start i at the maximum height
Until the node is found or i is one and the
next node is too large:
If the next node along the i link is less than the
target, traverse to the next node
Otherwise, decrease i by one
Runtime?
Randomized Data Structures
CSE 326 Autumn 2001
12
Randomized Skip List Intuition
It’s far too hard to insert into a perfect skip list
But is perfection necessary?
What matters in a skip list?
We want way fewer tall nodes than short ones
Make good progress through the list with each
high traverse
Randomized Data Structures
CSE 326 Autumn 2001
13
Randomized Skip List
Sorted linked list
# of links of a node is its height
The height i link of each node (that has one) links to
the next node of height i or greater
There should be about 1/2 as many height i+1 nodes
as height i nodes
29
23
20
19
11
10
CSE 326 Autumn 2001
22
13
8
2
Randomized Data Structures
14
Find in a RSL
Start i at the maximum height
Until the node is found or i is one and the
next node is too large:
If the next node along the i link is less than the
target, traverse to the next node
Otherwise, decrease i by one
Same as for a perfect skip list!
Runtime?
Randomized Data Structures
CSE 326 Autumn 2001
15
Insertion in a RSL
Flip a coin until it comes up heads; that takes
i flips. Make the new node’s height i.
Do a find, remembering nodes where we go
down
Put the node at the spot where the find ends
Point all the nodes where we went down (up
to the new node’s height) at the new node
Point the new node’s links where those
redirected pointers were pointing
Randomized Data Structures
CSE 326 Autumn 2001
16
RSL Insert Example
insert(22)
with 3 flips
13
8
29
23
20
19
11
10
2
29
23
20
19
11
10
22
13
8
2
Runtime?
Range Queries and Iteration
Range query: search for everything that falls between
two values
find start point
walk list at the bottom
output each node until the end point
Iteration: successively return (in order) each element in
the structure
start at beginning, walk list to end
Just like a linked list!
Runtimes?
Randomized Data Structures
CSE 326 Autumn 2001
18
Randomized Skip List
Summary
Implements Dictionary ADT
insert in expected O(log n)
find in expected O(log n)
Memory use
expected constant memory per node
about double a linked list
Less overhead than search trees for iteration
over range queries
Randomized Data Structures
CSE 326 Autumn 2001
19
Random Number Generation
Linear congruential generator
xi+1 = Axi mod M
Choose A, M to minimize repetitions
M is prime
(Axi mod M) cycles through {1, … , M-1} before repeating
Good choices:
M = 231 – 1 = 2,147,483,647
A = 48,271
Must handle overflow!
Randomized Data Structures
CSE 326 Autumn 2001
20
Some Properties of Primes
If:
P is prime
0 < X < P
Then:
XP-1 = 1 (mod P)
the only solutions to X2 = 1 (mod P) are:
X = 1 and X = P - 1
Randomized Data Structures
CSE 326 Autumn 2001
21
Checking Non-Primality
Exponentiation (pow function, Weiss 2.4.4) can be
modified to detect non-primality (composite-ness)
// Computes x^n mod(M)
HugeInt pow(HugeInt x, HugeInt n, HugeInt M) {
if (n == 0) return 1;
if (n == 1) return x;
HugeInt squared = x * x % M;
// 1 < x < M - 1 && squared == 1 --> M isn’t prime!
if (isEven(n)) return pow(squared, n/2, M);
else return (pow(squared, n/2, M) * x);
}
Randomized Data Structures
CSE 326 Autumn 2001
22
Checking Primality
Systematic algorithm
For prime P, for all A such that 0 < A < P
Calculate AP-1 mod P using pow
Check at each step of pow and at end for primality conditions
Randomized algorithm
Use one random A
If the randomized algorithm reports failure, then P isn’t prime.
If the randomized algorithm reports success, then P might be prime.
At most (P-9)/4 values of A fool this prime check
if algorithm reports success, then P is prime with probability > ¾
Each new A has independent probability of false positive
We can make probability of false positive arbitrarily small
Randomized Data Structures
CSE 326 Autumn 2001
23
Evaluating
Randomized Primality Testing
Your probability of being struck by lightning this year:
0.00004%
Your probability that a number that tests prime 11 times
in a row is actually not prime: 0.00003%
Your probability of winning a lottery of 1 million people
five times in a row: 1 in 2100
Your probability that a number that tests prime 50 times
in a row is actually not prime: 1 in 2100
Randomized Data Structures
CSE 326 Autumn 2001
24
Cycles
Randomized
algorithms
Random numbers
Prime numbers
Randomized Data Structures
CSE 326 Autumn 2001
25