Some Thoughts on Treaps
Download
Report
Transcript Some Thoughts on Treaps
Ideas on Treaps
Maverick Woo
<[email protected]>
N,33
H,20
E,9
M,14
K,7
T,17
S,12
P,8
W,6
U,2
Z,4
Disclaimer
Articles of interest
Raimund Seidel and Cecilia R. Aragon.
Randomized search trees.
Algorithmica 16 (1996), 464-497.
Guy E. Blelloch and Margaret Reid-Miller.
Fast Set Operations Using Treaps.
In Proc. 10th Annual ACM SPAA, 1998.
Of course this is joint work with Guy.
Hopefully Daniel will also show up.
May 2, 2001
2
Background
Very high level talk
No analysis
To make this a technical talk
i will insert a math symbol S
Some background
Splay Trees (zig, zig zig, zig zig zig…)
Treaps, if you still remember…
May 2, 2001
3
Agenda
Data structure research overview
Treaps refresher
Some current issues on Treaps
May 2, 2001
4
Data Structure Research
I am not qualified to say yet, but I do
have some “feelings” about it.
Not that many high-level problems.
Representing a set/ordering
Support some operations
Some say it’s all about applications.
Applications don’t have to very specific.
But need to be specific enough---we can
make assumptions.
May 2, 2001
5
What Operations?
Basic
Insert, Membership
Intermediate
Delete (e.g. Binomial vs. Fibonacci Heaps)
Disjoint-Union (e.g. Union-Find)
Higher Level
Union, Intersection, Difference
Finger Search
May 2, 2001
6
Behavior Restrictions
Persistence
“Functional”
More later…
Architecture Independence
Relatively new, a.k.a. “Cache-oblivious”
Runs efficiently on hierarchical memory
Avoid memory-specific parameterization
Forget data block size, cache line width etc.
Not my theme today
May 2, 2001
7
Why Persistence?
Many reasons for persistence
It’s practical with good garbage collectors.
Functional programming makes everyone’s
life easier.
For the theoretician
You don’t need to worry about side effects.
Better analysis possible: NESL
For the programmer
May 2, 2001
You don’t need to worry about side effects.
Less memory leak, less dangling pointers
8
Real-life example 1
You are have operations working on
multiple-instances.
You index the web.
You build your indices with your cool data
structures.
Conjunction query (AND) is intersection.
You do the intersection on two indices.
Now one of the indices can get corrupted.
May 2, 2001
9
Real-life example 2
You are rich.
Once upon a time, in a dot-com far away…
You run a multi-processor machine.
You learned that Splay Trees are cool.
You even learned how to write multi-
threaded programs.
Thread1 searches for x on SplayInstance42.
Thread2 searches for y on SplayInstance42.
Real-world situation: search engines
May 2, 2001
10
Data Structure vs. Hacking
Examples
To learn more about Splay Trees
Dial (412)-HACKERS.
Ask for Danny Sleator…
OK, real example
(Persistent) FIFO Queues
Operations
IsEmpty(Q), Enqueue(Q,x), Dequeue(Q)
Need to grow, let’s use Linked List…
May 2, 2001
11
FIFO Queues
Linked List is “bad” though
Transverse to tail takes linear time.
Either Enqueue or Dequeue is going to be linear time.
How about doubly-ended queues (deques)?
With that much extra space, may be faster with a tree.
If one is not good enough, use two.
Suppose queue is x1x2…xiyi+1yi+2…yn.
Represent as [x1x2…xi],[yn,yn-1…yi+1].
You can figure out the details yourself.
In the end, isn’t this just a hack?
May 2, 2001
12
Agenda
Data structure research overview
Treaps refresher
Some current issues on Treaps
May 2, 2001
13
Treaps Refresher
A Treap is a recursive data structure.
datatype 'a Treap =
E | T of priority * 'a Treap * 'a * 'a Treap
Each node has a key and a priority.
Assume all unique
Arrange key in in-order, priority in heap-order
Priority is chosen uniformly at random.
8-way independence suffices for the analysis
Can be computed with hash functions
Don’t need to store the priority
A key’s priority can be made consistent across runs
May 2, 2001
14
Treap Operations
Membership
As in binary search trees
Insert
Add as leaf by key (in-order)
Rotate up by priority (heap-order)
Delete
Reverse what insert does
Find-min, etc.
Walk on the left spine, etc.
May 2, 2001
15
Treap Split
Want top-down split (it’s faster)
(less, x, gtr) = Split(root, k)
If (root.k > k) // want to split left subtree
Let (l1, m, r1) = Split(root.left, k)
(l1, m, T(root.p, r1, root.k, root.right))
If (root.k < k) // want to split right subtree
Let (l1, m, r1) = Split(root.right, k)
(T(root.p, root.left, root.k, l1), m, r1)
Else
(root.left, root.k, root.right)
May 2, 2001
16
Treap Split Example
Before
After
Split(Tr,“V”)
N,33
H,20
E,9
M,14
K,7
May 2, 2001
P,8
W,6
U,2
Z,4
E,9
M,14
K,7
gtr
N,33
H,20
T,17
S,12
less
W,6
T,17
S,12
Z,4
U,2
P,8
17
Treap Split Persistence
These figures are deceptive.
Only 4 new nodes created
All on the search path to “V”
less
N,33
H,20
E,9
M,14
K,7
May 2, 2001
H,20
T,17
S,12
P,8
W,6
U,2
Z,4
E,9
M,14
K,7
gtr
N,33
W,6
T,17
S,12
Z,4
U,2
P,8
18
Treap Join
Join(less, gtr) // less < x < gtr
Handle empty less or gtr
If (less.p > gtr.p)
T(less.p, less.left, less.k, Join(less.right, gtr))
Else
T(gtr.p, Join(less, gtr.left), gtr.k, gtr.right)
May 2, 2001
19
Treap Join Example
After
Before
Join(less,gtr)
N,33
H,20
E,9
M,14
K,7
May 2, 2001
P,8
W,6
U,2
Z,4
E,9
M,14
K,7
gtr
N,33
H,20
T,17
S,12
less
W,6
T,17
S,12
Z,4
U,2
P,8
20
Treap Running Time
All expected O(lg n)
Also of note is Finger Search
Given a finger in a treap
Find the key that is d away in sorted order
Expected O(lg d) time
Require parent pointers
Evil… Waste so much space
See Seidel and Aragon for details.
May 2, 2001
21
Treap Union
Treaps really shine in set operations.
Union(a,b)
Suppose roots are (k1,p1), (k2,p2)
WLOG assume p1 > p2.
Let (less,x,gtr) = Split(b,k1).
T(p1, Union(a.left, less), k1,
Union(a.right, gtr))
May 2, 2001
22
Treap Intersection
Inter(a,b)
Suppose roots are (k1,p1), (k2,p2); p1>p2
Let (less,x,gtr) = Split(b,k1)
If x is null // k1 is not in b, sorry dude
Join(Inter(a.left, less), Inter(a.right, gtr))
Else
T(p1, Inter(a.left, less), k1, Inter(a.right, gtr))
May 2, 2001
23
Treap Difference
Similar to intersection
Change the logic a bit
Messier because it is not symmetric
Leave as an exercise to the reader.
May 2, 2001
24
Points of Note
Persistence
Did you see a side effect? (assignments?)
Parallelization
Parallelize without persistence is a pain.
Very natural divide-and-conqueror
Run the two recursive calls on different CPUs
Running times…
May 2, 2001
25
Set Operation Running Time
For two sets of size m and n (m · n)
Optimal is
q(m lg (n/m))
What’s known before this work
With AVL Trees, O(m lg(n/m))
Rather complicated algorithms
For the sake of your smooth digestion…
Compare this to O(m+n) or O(m lg n)
With Treaps
Can use Finger Search if we have parent pointers
Does not parallelize---multiple fingers???
May 2, 2001
26
Set Operation Running Time
What’s known after this work
No parent pointers
Parallelize naturally
Optimal expected running time
O(m lg (n/m))
Analysis available in Blelloch and Miller
Relatively simple algorithm
Experimental results
6.3-6.8 speedup on 8-processor SGI machine
4.1-4.4 speedup on 5-processor Sun machine
May 2, 2001
27
Agenda
Data structure research overview
Treaps refresher
Some current issues on Treaps
May 2, 2001
28
A Word on Splay Trees
Splay Trees are slow in practice!
Even a single simple search would require
O(lg n) pointer updates!
Skip Lists are way simpler and faster.
Let’s switch all Splay Trees to Skip Lists.
Danny???
May 2, 2001
29
Bruce said…
First find Danny.
Ditch Splay Trees---say they are slow.
Then praise Skip Lists.
Danny will
refute by quoting experimental studies.
Splay Trees are not much slower than Skip List
in practice.
ask who’s my advisor.
I wonder if that works. So I tried.
May 2, 2001
30
Current Issues on Treaps
Treaps are simpler than Splay Trees
No famous conjecture for my back pocket
Neat idea from Adam Kalai
Not self-adjusting
Access introduces more explicit changes
Adding data compression to Treaps
Finger search on Treaps
Work by Guy + Daniel Blandford
May 2, 2001
31
Adding Compression to Treaps
Search engines
Infrequent offline update (once a month)
Frequent online query and set operations
Keys are unique.
Keys can be huge and occurs sparsely.
Let’s compress the keys!
Assume they are 64-bit integers.
May 2, 2001
32
We’ve got a problem!
I don’t know how to deploy data
compression to general data structures.
Begin with the simplest---Array
The naïve approach
Compress the whole array
When need to access an element
decompress the whole array
do the access
compress the whole array again
May 2, 2001
33
Isn’t that dumb?
Any suggestions?
Use chunking
Divide the array into blocks of size C.
Compress each block individually.
Now we are back to “constant” time!
Shh!!! That could be a trade secret.
Of course they use something better than
vanilla array.
May 2, 2001
34
Chunking a Treap
A sub-tree is a chunk.
Desire consistent chunk size
But Treaps are usually not full.
Need better chunking rules
Chunks
Can’t be too big---hurt running time
Can’t be too small---hurt compression
(space)
May 2, 2001
35
Vocab
Internal node and Leaf block
More precisely
datatype
tblock =
Packed of int * key * key * key vector
| UnPacked of int * int * key vector
datatype
trearray =
TE
| TB of tblock
| TN of trearray * key * trearray
All running time are in expected case.
May 2, 2001
36
Idea 1 – Thresholds
Priority is in the range 1 to maxP
Invent a threshold Pth
e.g. maxP - log(maxP)
For n=(p,k)
If p > Pth, then n is an internal node.
Otherwise, n is in some leaf block.
Trick done when a key is inserted.
Also maintained by various operations.
May 2, 2001
37
Idea 1 – Features
On average, constant ratio between
internal keys to “keys in block”.
With Pth = maxP - log(maxP), N keys
log N internal nodes
Height is log log N.
O(log N) “bottom” node, each w/ a block
Expect (N-log N) / O(log N) keys / block
Binary search in block takes O(log N)
May 2, 2001
38
Idea 1 – Running Time
Query is still O(log n).
Insert is also O(log n).
Join, Split both take O(log n).
Set operations rely on Join and Split’s
O(log n) running time.
Looking good…
May 2, 2001
39
Idea 1 – Problems
Asymptotic bound
Need to work out the constants
Exact analysis in progress
I now think of Knuth even higher…
SML implementation
Make the idea as concrete as code
Can now do more experiments
May 2, 2001
40
Idea 1 – Questions
Do we really need to maintain
consistent priority across runs?
Make things simpler
But Union looks suspicious
What compression algorithm to use?
No general data compression
Take advantage of index distribution
May 2, 2001
41
Idea 2 – Small Blocks
Want a more-or-less constant block size
Small blocks are more realistic
Say 20
Processor specific---fit cache line size
How well can we compress 20 integers?
Leave for second stage investigation
May 2, 2001
42
Perhaps I can share
Writing down algorithm as code helps
Pseudo code are good for short algorithms
Real code is more concrete.
Good for sloppy people like me.
Actual SML code
You can figure out you missed some cases.
Now if SML has a debugger…
Space time tradeoff is very real
May 2, 2001
43
Treap Finger Search
Daniel is working on it.
No parent pointers needed
Can mimic parent pointers by reversing
root-to-(last accessed leaf) path
Should probably leave this to him
May 2, 2001
44
Q&A / Suggestions
Work in progress, welcome suggestions
Danny, don’t kick me too hard…
May 2, 2001
45