augmented binary search tree

Download Report

Transcript augmented binary search tree

2IL05 Data Structures
Spring 2010
Lecture 9: Augmenting Data Structures
Data structures
 Data structures are used in many applications
 directly:
the user repeatedly queries the data structure
 indirectly: to make algorithms run faster
 In most cases a standard data structure is sufficient
(possibly provided by a software library)
 But sometimes one needs additional operations that aren’t
supported by any standard data structure
➨ need to design new data structure?
 Not always: often augmenting an existing structure is sufficient
Example
S set of elements, each with an unique key.
Operations
Search(S, k): return a pointer to an element x in S with key[x] = k, or
NIL if such an element does not exist.
OS-select(S, i): return a pointer to an element x in S with the ith
smallest key (the key with rank i)
Solution: sorted array
A 2 5 6 9 10 11 24 27 31 35 41 43 54 55 73
the key with rank i is stored in A[i]
Example
S set of elements, each with an unique key.
Operations
Search(S, k): return a pointer to an element x in S with key[x] = k, or
NIL if such an element does not exist.
OS-select(S, i): return a pointer to an element x in S with the ith
smallest key (the key with rank i)
Insert(S, x): inserts element x into S, that is, S ← S ⋃ {x}
Delete(S, x): remove element x from S
Solution?
Idea: Use red-black trees
OS-select(S, 3): report key with rank 3
Is the key with rank 3 in the
left subtree, in the right
subtree, or in the root?
10
2
18
NIL NIL
12
Idea 1: store the rank of each
node in the node
2
➨
2│1
NIL
50
17
NIL NIL
NIL NIL
Idea 1: Store the rank in each node
Problem: Insertion can change the rank of every node!
➨ worst case O(n)
10 │
10
2
22
│1
NIL NIL
18 │
18
5
12 │
12
3
NIL
17 │
17
4
NIL NIL
50 │
50
6
NIL NIL
Idea 1: Store the rank in each node
Problem: Insertion can change the rank of every node!
➨ worst case O(n)
10 │
10
2
22
│1
NIL NIL
Idea 2: store the size of the
subtree in each node
18 │
18
5
12 │
12
3
NIL
17 │
17
4
NIL NIL
50 │
50
6
NIL NIL
Idea 2: Store the size of the subtree
10 │
10
6
22
│1
NIL NIL
18 │
18
4
12 │
12
2
NIL
17 │
17
1
NIL NIL
50 │
50
1
NIL NIL
Idea 2: Store the size of the subtree
10 │
10
6
 store in each node x:
 left [x], right [x]
 parent [x]
22
│1
18 │
18
4
 key [x]
 color [x]
NIL NIL
 size [x] = number of keys in
subtree rooted at x
(size [NIL] = 0)
order-statistic tree
12 │
12
2
NIL
17 │
17
1
NIL NIL
50 │
50
1
NIL NIL
Order-statistic trees: OS-Select
OS-Select(x, i): return pointer to node containing the ith smallest key of
the subtree rooted at x
OS-Select(x , i)
1. r = size[left[x]] + 1
2. if i == r
3.
then return x
4. else if i < r
5.
then return OS-Select(left[x], i)
6. else return OS-Select(right[x], i)
Running time? O(log n)
i-r
OS-Select(x, 17)
r = 16
17
25
size = 24
16
15
x
Order-statistic trees: OS-Rank
OS-Rank(T, x): return the rank of x in the linear order determined by
an inorder walk of T
= 1 + number of keys smaller than x
x
Order-statistic trees: OS-Rank
OS-Rank(T, x): return the rank of x in the linear order determined by
an inorder walk of T
= 1 + number of keys smaller than x
OS-Rank(T, x)
1. r = size[left[x]] + 1
2. y = x
3. while y ≠ root [T]
4.
do if y == right[p[y]]
5.
then r = r + size [left[p[y]] + 1
6.
y = p[y]
7. return r
Running time? O(log n)
x
OS-Rank: Correctness
OS-Rank(T, x)
1. r = size[left[x]] + 1
2. y = x
3. while y ≠ root [T]
4.
do if y == right[p[y]]
5.
then r = r + size [left[p[y]] + 1
6.
y = p[y]
7. return r
Invariant
At the start of each iteration of the while loop, r = rank of key[x] in Ty
Initialization
r = rank of key[x] in Tx (y = x)
= number of keys smaller than key[x] in Tx +1
= size[left[x]] + 1 (binary-search-tree property)
subtree with root y
OS-Rank: Correctness
OS-Rank(T, x)
1. r = size[left[x]] + 1
2. y = x
3. while y ≠ root [T]
4.
do if y == right[p[y]]
5.
then r = r + size [left[p[y]] + 1
6.
y = p[y]
7. return r
Invariant
At the start of each iteration of the while loop, r = rank of key[x] in Ty
Termination
loop terminates when y = root[T]
➨ subtree rooted at y is entire tree
➨ r = rank of key[x] in entire tree
OS-Rank: Correctness
OS-Rank(T, x)
1. r = size[left[x]] + 1
2. y = x
3. while y ≠ root [T]
4.
do if y == right[p[y]]
5.
then r = r + size [left[p[y]] + 1
6.
y = p[y]
7. return r
Invariant
At the start of each iteration of the while loop, r = rank of key[x] in Ty
Maintenance
case i: y = right[p[y]]
➨ all keys Tleft[p[y]] and key[p[y]] smaller than key[x]
➨ rank key[x] in Tp[y] = rank key[x] in Ty + size [left[p[y]] + 1
OS-Rank: Correctness
OS-Rank(T, x)
1. r = size[left[x]] + 1
2. y = x
3. while y ≠ root [T]
4.
do if y == right[p[y]]
5.
then r = r + size [left[p[y]] + 1
6.
y = p[y]
7. return r
Invariant
At the start of each iteration of the while loop, r = rank of key[x] in Ty
Maintenance
case ii: y = left[p[y]]
➨ all keys Tright[p[y]] and key[p[y]] larger than key[x]
➨ rank key[x] in Tp[y] = rank key[x] in Ty
Order-statistic trees: Insertion and
deletion
Insertion and deletion
as in a regular red-black tree, but we have to update size[x] field
Red-black trees: Insertion
1. Do a regular binary search tree insertion
15
2. Fix the red-black properties
10 +1
Step1
 find the leave where the node
should be inserted
 replace the leave by a red node
that contains the key to be inserted
 size of the new node = 1
 increment size of each node on the
search path
2
23 +1
12 +1
50
17 +1
15 size[x] = 1
Red-black trees: Insertion
1.
a regular
tree2insertion
TheDo
new
node isbinary
red ➨search
Property
or 4 can be violated.
Remove
violationproperties
by rotations and recoloring.
2.
Fix thethe
red-black
10
Red-black properties
1.
2.
3.
4.
Every node is either red or black.
The root is black
Every leaf (nil[T]) is black.
If a node is red, then both its children are
black.
5. For each node, all paths from the node to
descendant leaves contain the same
number of black nodes.
23
2
12
50
17
15
Rotation
right rotation around y
y
x
x
y
left rotation around x
 A rotation affects only size[x] and size[y]
 We can determine the new values based on the size of children:
size[x] = size[left[x]] + size[right[x]] + 1
and the same for y …
Order-statistic trees
The operations Insert, Delete, Search, OS-Select, and OS-Rank
can be executed with an order-statistic tree in O(log n) time.
Augmenting data structures
Methodology for augmenting a data structure
OS tree
1. Choose an underlying data structure.
1. R-B tree
2. Determine additional information to maintain.
2. size[x]
3. Verify that we can maintain additional information
for existing data structure operations.
3. maintain size
during insert
and delete
4. Develop new operations.
4. OS-Select
and OS-Rank
 You don’t need to do these steps in strict order!
 Red-black trees are very well suited to augmentation …
Augmenting red-black trees
Theorem
Augment a R-B tree with field f, where f[x] depends only on
information in x, left[x], and right[x] (including f[left[x]] and f[right[x]]).
Then can maintain values of f in all nodes during insert and delete
without affecting O(log n) performance.
When we alter information in x, changes propagate only upward on
the search path for x …
Augmenting red-black trees
Theorem
Augment a R-B tree with field f, where f[x] depends only on
information in x, left[x], and right[x] (including f[left[x]] and f[right[x]]).
Then can maintain values of f in all nodes during insert and delete
without affecting O(log n) performance.
Proof (insert)
Step 1
Do a regular binary search tree insertion
 go up from inserted node and update f
 additional time: O(log n)
x
Augmenting red-black trees
Theorem
Augment a R-B tree with field f, where f[x] depends only on
information in x, left[x], and right[x] (including f[left[x]] and f[right[x]]).
Then can maintain values of f in all nodes during insert and delete
without affecting O(log n) performance.
Proof (insert)
Step 2
y
x
Fix the red-black properties by rotations and recoloring
x
y
 update f for x, y, and their ancestors
 additional time per rotation: O(log n)
Example: Interval trees
Interval trees
S set of closed intervals
closed: endpoints are part of the interval
j
i
low[i]
high[i]
Operations
Interval-Insert(T, x): adds an interval x, whose int field is assumed
to contain an interval, to the interval tree T.
Interval-Delete(T, x): removes the element x from the interval tree T.
Interval-Search(T, j): returns pointer to a node x in T such that int[x]
overlaps j, or nil[T] if no such element exists.
Methodology
1. Choose an underlying data structure.
2. Determine additional information to maintain.
3. Verify that we can maintain additional information
for existing data structure operations.
4. Develop new operations.
Methodology
1. Choose an underlying data structure.
 use red-black trees
 each node x contains interval int[x]
 key is left endpoint low[int[x]]
inorder walk would list intervals sorted by low endpoint
2. Determine additional information to maintain.
3. Verify that we can maintain additional information
for existing data structure operations.
4. Develop new operations.
Methodology
1. Choose an underlying data structure.
✔
2. Determine additional information to maintain.
3. Verify that we can maintain additional information
for existing data structure operations.
4. Develop new operations.
Additional information for Interval-Search
case 1: i ∩ j ≠ Ø ➨ report i
j
i
low[.] ≤ low[i]
low[.] > low[i]
j
i
Additional information for Interval-Search
case 1: i ∩ j ≠ Ø ➨ report i
j
i
low[.] ≤ low[i]
case 2: j lies left of i
➨ j cannot overlap any
interval in the right subtree
low[.] > low[i]
j
i
Additional information for Interval-Search
case 1: i ∩ j ≠ Ø ➨ report i
j
i
low[.] ≤ low[i]
case 2: j lies left of i
➨ j cannot overlap any
interval in the right subtree
case 3: j lies right of i
➨ need additional information!
low[.] > low[i]
j
i
max[x] = max endpoint value in subtree rooted at x
= max{high[i] where i is stored in the subtree rooted at x}
Additional information for Interval-Search
case 1: i ∩ j ≠ Ø ➨ report i
j
i
low[.] ≤ low[i]
case 2: j lies left of i
➨ j cannot overlap any
interval in the right subtree
case 3: j lies right of i
➨ j overlaps interval in left
subtree if and only if
low[j] ≤ max[left[i]]
low[.] > low[i]
j
i
max[x] = max endpoint value in subtree rooted at x
= max{high[i] where i is stored in the subtree rooted at x}
Methodology
1. Choose an underlying data structure.
✔
2. Determine additional information to maintain.
✔
3. Verify that we can maintain additional information
for existing data structure operations.
4. Develop new operations.
Interval-Search
Interval-Search(T, j)
1. x = root [ T ]
2. while x ≠ nil[T] and j does not overlap int[x]
3.
do if left[x] ≠ NIL and max[left[x]] ≥ low[j]
4.
then x = left[x]
5.
else x = right[x]
6. return x
Correctness
Invariant
If tree T contains an interval that overlaps j, then there is such
an interval in the subtree rooted at x.
Running time? O(log n)
Methodology
1. Choose an underlying data structure.
✔
2. Determine additional information to maintain.
✔
3. Verify that we can maintain additional information
for existing data structure operations.
4. Develop new operations.
✔
Maintain additional information
Theorem
Augment a R-B tree with field f, where f[x] depends only on
information in x, left[x], and right[x] (including f[left[x]] and f[right[x]]).
Then can maintain values of f in all nodes during insert and delete
without affecting O(log n) performance.
Additional information
max[x] = max endpoint value in subtree rooted at x
 max[x] depends only on




information in x: high[int[x]]
information in left[x]: max[left[x]]
information in right[x]: max[right[x]]
max[x] = max{high[int[x]], max[left[x]], max[right[x]]}
➨ insert and delete still run in O(log n) time
Search for all intervals
Interval-Search-All(T, j)
report all intervals that overlap j
Worst case running time? Θ(k log n) where k = # of reported intervals
How? See Assignment 8 …
 There is another type of interval tree that can answer this query in
Θ(k + log n) time.
Another interval tree
 use a red-black tree, key is left endpoint low[.]
 store each interval i є S in the highest node
x such that i contains low[x]
i1
Another interval tree
 use a red-black tree, key is left endpoint low[.]
{i2, i3, i1}
{i2, i1, i3}
 store each interval i є S in the highest node
x such that i contains low[x]
➨ each node can store several intervals
➨ use two sorted lists (one on low[.] and one
on high[.])
i2
i3
i1
Tutorials this week
 No small tutorials on Tuesday 1+2.
 Thursday 3+4 big tutorial.