Some Thoughts on Treaps

Download Report

Transcript Some Thoughts on Treaps

Ideas on Treaps
Maverick Woo
<[email protected]>
N,33
H,20
E,9
M,14
K,7
T,17
S,12
P,8
W,6
U,2
Z,4
Disclaimer
Articles of interest
 Raimund Seidel and Cecilia R. Aragon.
Randomized search trees.
Algorithmica 16 (1996), 464-497.
 Guy E. Blelloch and Margaret Reid-Miller.
Fast Set Operations Using Treaps.
In Proc. 10th Annual ACM SPAA, 1998.
Of course this is joint work with Guy.
 Hopefully Daniel will also show up.
May 2, 2001
2
Background
Very high level talk
 No analysis
 To make this a technical talk
 i will insert a math symbol S
Some background
 Splay Trees (zig, zig zig, zig zig zig…)
 Treaps, if you still remember…
May 2, 2001
3
Agenda
Data structure research overview
Treaps refresher
Some current issues on Treaps
May 2, 2001
4
Data Structure Research
I am not qualified to say yet, but I do
have some “feelings” about it.
Not that many high-level problems.
 Representing a set/ordering
 Support some operations
Some say it’s all about applications.
 Applications don’t have to very specific.
 But need to be specific enough---we can
make assumptions.
May 2, 2001
5
What Operations?
Basic
 Insert, Membership
Intermediate
 Delete (e.g. Binomial vs. Fibonacci Heaps)
 Disjoint-Union (e.g. Union-Find)
Higher Level
 Union, Intersection, Difference
 Finger Search
May 2, 2001
6
Behavior Restrictions
Persistence
 “Functional”
 More later…
Architecture Independence
 Relatively new, a.k.a. “Cache-oblivious”
 Runs efficiently on hierarchical memory
 Avoid memory-specific parameterization
 Forget data block size, cache line width etc.
 Not my theme today
May 2, 2001
7
Why Persistence?
Many reasons for persistence
 It’s practical with good garbage collectors.
 Functional programming makes everyone’s
life easier.
 For the theoretician


You don’t need to worry about side effects.
Better analysis possible: NESL
 For the programmer


May 2, 2001
You don’t need to worry about side effects.
Less memory leak, less dangling pointers
8
Real-life example 1
You are have operations working on
multiple-instances.
 You index the web.
 You build your indices with your cool data
structures.
 Conjunction query (AND) is intersection.
 You do the intersection on two indices.
 Now one of the indices can get corrupted.
May 2, 2001
9
Real-life example 2
You are rich.
 Once upon a time, in a dot-com far away…
 You run a multi-processor machine.
 You learned that Splay Trees are cool.
 You even learned how to write multi-
threaded programs.
 Thread1 searches for x on SplayInstance42.
 Thread2 searches for y on SplayInstance42.
 Real-world situation: search engines
May 2, 2001
10
Data Structure vs. Hacking
Examples
 To learn more about Splay Trees
 Dial (412)-HACKERS.
 Ask for Danny Sleator…
 OK, real example
 (Persistent) FIFO Queues
 Operations

IsEmpty(Q), Enqueue(Q,x), Dequeue(Q)
 Need to grow, let’s use Linked List…
May 2, 2001
11
FIFO Queues
Linked List is “bad” though
 Transverse to tail takes linear time.
 Either Enqueue or Dequeue is going to be linear time.
 How about doubly-ended queues (deques)?
 With that much extra space, may be faster with a tree.
If one is not good enough, use two.
 Suppose queue is x1x2…xiyi+1yi+2…yn.
 Represent as [x1x2…xi],[yn,yn-1…yi+1].
 You can figure out the details yourself.
 In the end, isn’t this just a hack?
May 2, 2001
12
Agenda
Data structure research overview
Treaps refresher
Some current issues on Treaps
May 2, 2001
13
Treaps Refresher
A Treap is a recursive data structure.
 datatype 'a Treap =
E | T of priority * 'a Treap * 'a * 'a Treap
Each node has a key and a priority.
 Assume all unique
Arrange key in in-order, priority in heap-order
Priority is chosen uniformly at random.
 8-way independence suffices for the analysis
 Can be computed with hash functions
 Don’t need to store the priority
 A key’s priority can be made consistent across runs
May 2, 2001
14
Treap Operations
Membership
 As in binary search trees
Insert
 Add as leaf by key (in-order)
 Rotate up by priority (heap-order)
Delete
 Reverse what insert does
Find-min, etc.
 Walk on the left spine, etc.
May 2, 2001
15
Treap Split
Want top-down split (it’s faster)
(less, x, gtr) = Split(root, k)
 If (root.k > k) // want to split left subtree
 Let (l1, m, r1) = Split(root.left, k)
 (l1, m, T(root.p, r1, root.k, root.right))
 If (root.k < k) // want to split right subtree
 Let (l1, m, r1) = Split(root.right, k)
 (T(root.p, root.left, root.k, l1), m, r1)
 Else
 (root.left, root.k, root.right)
May 2, 2001
16
Treap Split Example
Before
After
Split(Tr,“V”)
N,33
H,20
E,9
M,14
K,7
May 2, 2001
P,8
W,6
U,2
Z,4
E,9
M,14
K,7
gtr
N,33
H,20
T,17
S,12
less
W,6
T,17
S,12
Z,4
U,2
P,8
17
Treap Split Persistence
These figures are deceptive.
Only 4 new nodes created
All on the search path to “V”
less
N,33
H,20
E,9
M,14
K,7
May 2, 2001
H,20
T,17
S,12
P,8
W,6
U,2
Z,4
E,9
M,14
K,7
gtr
N,33
W,6
T,17
S,12
Z,4
U,2
P,8
18
Treap Join
Join(less, gtr) // less < x < gtr
 Handle empty less or gtr
 If (less.p > gtr.p)
 T(less.p, less.left, less.k, Join(less.right, gtr))
 Else
 T(gtr.p, Join(less, gtr.left), gtr.k, gtr.right)
May 2, 2001
19
Treap Join Example
After
Before
Join(less,gtr)
N,33
H,20
E,9
M,14
K,7
May 2, 2001
P,8
W,6
U,2
Z,4
E,9
M,14
K,7
gtr
N,33
H,20
T,17
S,12
less
W,6
T,17
S,12
Z,4
U,2
P,8
20
Treap Running Time
All expected O(lg n)
Also of note is Finger Search
 Given a finger in a treap
 Find the key that is d away in sorted order
 Expected O(lg d) time
 Require parent pointers
 Evil… Waste so much space
See Seidel and Aragon for details.
May 2, 2001
21
Treap Union
Treaps really shine in set operations.
Union(a,b)
 Suppose roots are (k1,p1), (k2,p2)
 WLOG assume p1 > p2.
 Let (less,x,gtr) = Split(b,k1).
 T(p1, Union(a.left, less), k1,
Union(a.right, gtr))
May 2, 2001
22
Treap Intersection
Inter(a,b)
 Suppose roots are (k1,p1), (k2,p2); p1>p2
 Let (less,x,gtr) = Split(b,k1)
 If x is null // k1 is not in b, sorry dude
 Join(Inter(a.left, less), Inter(a.right, gtr))
 Else
 T(p1, Inter(a.left, less), k1, Inter(a.right, gtr))
May 2, 2001
23
Treap Difference
Similar to intersection
 Change the logic a bit
 Messier because it is not symmetric
Leave as an exercise to the reader.
May 2, 2001
24
Points of Note
Persistence
 Did you see a side effect? (assignments?)
Parallelization
 Parallelize without persistence is a pain.
 Very natural divide-and-conqueror
 Run the two recursive calls on different CPUs
Running times…
May 2, 2001
25
Set Operation Running Time
For two sets of size m and n (m · n)
 Optimal is
 q(m lg (n/m))
What’s known before this work
 With AVL Trees, O(m lg(n/m))
 Rather complicated algorithms

For the sake of your smooth digestion…
 Compare this to O(m+n) or O(m lg n)
 With Treaps
 Can use Finger Search if we have parent pointers
 Does not parallelize---multiple fingers???
May 2, 2001
26
Set Operation Running Time
What’s known after this work
 No parent pointers
 Parallelize naturally
 Optimal expected running time
 O(m lg (n/m))
 Analysis available in Blelloch and Miller
 Relatively simple algorithm
 Experimental results
 6.3-6.8 speedup on 8-processor SGI machine
 4.1-4.4 speedup on 5-processor Sun machine
May 2, 2001
27
Agenda
Data structure research overview
Treaps refresher
Some current issues on Treaps
May 2, 2001
28
A Word on Splay Trees
Splay Trees are slow in practice!
 Even a single simple search would require
O(lg n) pointer updates!
Skip Lists are way simpler and faster.
Let’s switch all Splay Trees to Skip Lists.
Danny???
May 2, 2001
29
Bruce said…
First find Danny.
 Ditch Splay Trees---say they are slow.
 Then praise Skip Lists.
Danny will
 refute by quoting experimental studies.
 Splay Trees are not much slower than Skip List
in practice.
 ask who’s my advisor.
I wonder if that works. So I tried.
May 2, 2001
30
Current Issues on Treaps
Treaps are simpler than Splay Trees
 No famous conjecture for my back pocket
 Neat idea from Adam Kalai
 Not self-adjusting
 Access introduces more explicit changes
Adding data compression to Treaps
Finger search on Treaps
 Work by Guy + Daniel Blandford
May 2, 2001
31
Adding Compression to Treaps
Search engines
 Infrequent offline update (once a month)
 Frequent online query and set operations
 Keys are unique.
 Keys can be huge and occurs sparsely.
Let’s compress the keys!
Assume they are 64-bit integers.
May 2, 2001
32
We’ve got a problem!
I don’t know how to deploy data
compression to general data structures.
Begin with the simplest---Array
The naïve approach
 Compress the whole array
 When need to access an element
 decompress the whole array
 do the access
 compress the whole array again
May 2, 2001
33
Isn’t that dumb?
Any suggestions?
Use chunking
 Divide the array into blocks of size C.
 Compress each block individually.
 Now we are back to “constant” time!
Shh!!! That could be a trade secret.
 Of course they use something better than
vanilla array.
May 2, 2001
34
Chunking a Treap
A sub-tree is a chunk.
 Desire consistent chunk size
But Treaps are usually not full.
 Need better chunking rules
Chunks
 Can’t be too big---hurt running time
 Can’t be too small---hurt compression
(space)
May 2, 2001
35
Vocab
Internal node and Leaf block
More precisely
 datatype
tblock =
Packed of int * key * key * key vector
| UnPacked of int * int * key vector
datatype
trearray =
TE
| TB of tblock
| TN of trearray * key * trearray
All running time are in expected case.
May 2, 2001
36
Idea 1 – Thresholds
Priority is in the range 1 to maxP
Invent a threshold Pth
 e.g. maxP - log(maxP)
For n=(p,k)
 If p > Pth, then n is an internal node.
 Otherwise, n is in some leaf block.
Trick done when a key is inserted.
 Also maintained by various operations.
May 2, 2001
37
Idea 1 – Features
On average, constant ratio between
internal keys to “keys in block”.
With Pth = maxP - log(maxP), N keys
 log N internal nodes
 Height is log log N.
 O(log N) “bottom” node, each w/ a block
 Expect (N-log N) / O(log N) keys / block
 Binary search in block takes O(log N)
May 2, 2001
38
Idea 1 – Running Time
Query is still O(log n).
Insert is also O(log n).
Join, Split both take O(log n).
Set operations rely on Join and Split’s
O(log n) running time.
Looking good…
May 2, 2001
39
Idea 1 – Problems
Asymptotic bound
Need to work out the constants
 Exact analysis in progress
 I now think of Knuth even higher…
 SML implementation
 Make the idea as concrete as code
 Can now do more experiments
May 2, 2001
40
Idea 1 – Questions
Do we really need to maintain
consistent priority across runs?
 Make things simpler
 But Union looks suspicious
What compression algorithm to use?
 No general data compression
 Take advantage of index distribution
May 2, 2001
41
Idea 2 – Small Blocks
Want a more-or-less constant block size
Small blocks are more realistic
Say 20
 Processor specific---fit cache line size
 How well can we compress 20 integers?
Leave for second stage investigation
May 2, 2001
42
Perhaps I can share
Writing down algorithm as code helps
 Pseudo code are good for short algorithms
 Real code is more concrete.
 Good for sloppy people like me.
 Actual SML code
 You can figure out you missed some cases.
 Now if SML has a debugger…
Space time tradeoff is very real
May 2, 2001
43
Treap Finger Search
Daniel is working on it.
No parent pointers needed
Can mimic parent pointers by reversing
root-to-(last accessed leaf) path
Should probably leave this to him
May 2, 2001
44
Q&A / Suggestions
Work in progress, welcome suggestions
Danny, don’t kick me too hard…
May 2, 2001
45