Augmenting Data Structures 1

Download Report

Transcript Augmenting Data Structures 1

CS473-Algorithms I
Lecture X
Augmenting Data Structures
CS 473
Lecture X
1
Augmenting Data Structures
 When dealing with a new problem
•
Data structures play important role
•
Must design or addopt a data structure.
 Only in rare situtations
• We need to create an entirely new type of data
structure.
 More Often
• It suffices to augment a known data structure by
storing additional information.
• Then we can program new operations for the data
structure to support the desired application
CS 473
Lecture X
2
Augmenting Data Structures (2)
 Not Always Easy
• Because, added info must be updated and maintained
by the ordinary operations on the data structure.
 Operations
• Augmented data structure (ADS) has operations
inherited from underlying data structure (UDS).
• UDS Read/Query operations are not a problem. (ie.
Min-Heap Minimum Query)
• UDS Modify operations should update additional
information without adding too much cost. (ie. MinHeap Extract Min, Decrease Key)
CS 473
Lecture X
3
Dynamic Order Statistics
 Example problem;
• Dynamic Order Statistics, where we need two
operations;
OS-SELECT(x,i): returns ith smallest key in
subtree rooted at x
OS-RANK(T,x): returns rank (position) of x in
sorted (linear) order of tree T.
 Other operations
CS 473
•
Query: Search, Min, Max, Successor, Predecessor
•
Modify: Insert, Delete
Lecture X
4
Dynamic Order Statistics (2)
 Sorted or linear order of a binary search tree T is
determined by inorder tree walk of T.
IDEA:
• Use Red-Black (R-B) tree as the underlying data
structure.
• Keep subtree size in nodes as additional information.
CS 473
Lecture X
5
Dynamic Order Statistics (3)
 Relation Between Subtree Sizes;
size[x] = size[left[x]] + size[right[x]] + 1
The node
itself
 Note on implementation;
• For convenience use sentinel NIL[T] such that;
size[NIL[T]] = 0
• Since high level languages do not have operations
on NIL values. (ie. Java has NullPointerException)
CS 473
Lecture X
6
Dynamic Order Statistics Notation
KEY
SUBTREE
SIZE
 Node Structure:
• Key as with any Binary Search Tree (Tree is
indexed according to key)
• Subtree Size as additional Data on Node.
CS 473
Lecture X
7
Dynamic Order Statistics Example
10=7+2+1
M
10
C
7
A
1
P
2
F
5
D
1
CS 473
5=1+3+1
H
3
G
1
Q
1
1= size[NIL] + size[NIL] + 1
= 0+0+1
3=1+1+1
K
1
Lecture X
Linear Order
ACDFGHKMPQ
8
Retrieving an Element With a Given Rank
OS-SELECT(x, i)
r  size[left[x]] + 1
if i = r then
return x
elseif i < r then
return OS-SELECT( left[x], i)
else
return OS-SELECT(right[x], i)
CS 473
Lecture X
9
OS-SELECT Example
M
10
r>i
Go
Left
C
7
A
1
P
2
F
5
D
1
Q
1
H
3
G
1
CS 473
i=6
r=8
K
1
Lecture X
10
r<i
Go
Right
i=6
r=2
OS-SELECT Example
M
10
C
7
A
1
P
2
F
5
D
1
Q
1
H
3
G
1
CS 473
i=6
r=8
K
1
Lecture X
11
OS-SELECT Example
M
10
i=6
r=2
C
7
A
1
P
2
F
5
D
1
i=4
r=2
r<i
Go
Right
Q
1
H
3
G
1
CS 473
i=6
r=8
K
1
Lecture X
12
OS-SELECT Example
M
10
i=6
r=2
C
7
A
1
P
2
F
5
D
1
i=4
r=2
H
3
G
1
CS 473
i=6
r=8
Q
1
i=2
r=2
r=i
Found
!!!
K
1
Lecture X
13
OS-SELECT
 IDEA:
• Knowing size of left subtree (number of nodes smaller
than current) tells you which subtree the answer is in
 Running Time: O(lgn)
• Each recursive call goes down one level in the OSTREE
• Running time = O(d) where d = depth of the ith element
• Worst case running time is proportional to the height of
the tree
• Since tree is a R-B tree, its height is balanced and O(lgn)
CS 473
Lecture X
14
Determining the Rank of an Element
OS-RANK(T, x)
r  size[left[x]] + 1
yx
while y  root[T] do
if y = right[p[y]] then
r  r + size[left[p[y]]] + 1
yp[y]
return r
CS 473
Lecture X
15
Dynamic Order Statistics Example
M
10
C
7
A
1
P
2
F
5
D
1
H
3
G
1
CS 473
Q
1
init r=2
right child
r=2+1+1=4
K
1
Lecture X
16
Dynamic Order Statistics Example
M
10
C
7
A
1
P
2
F
5
D
1
Q
1
H
3
G
1
CS 473
right child
r=4+1+1=6
K
1
Lecture X
17
Dynamic Order Statistics Example
M
10
C
7
A
1
F
5
D
1
Q
1
H
3
G
1
CS 473
P
2
left child
r=6
K
1
Lecture X
18
Dynamic Order Statistics Example
M
10
Root and Answer!!!
r=6
C
7
A
1
F
5
D
1
Q
1
H
3
G
1
CS 473
P
2
K
1
Lecture X
19
OS-RANK
 IDEA:
• rank[x] = # of nodes preceeding x in an inorder tree walk
+ 1 (itself)
• Follow the simple upward path from node x to the root.
All left subtrees in this path contribute to rank[x]
 Running Time: O(lgn)
• Each iteration of the while-loop takes O(1) time
• y goes up one level in the tree with each iteration.
• Running time = O(dx), where dx is the depth of node x
• Running time, at worst proportional to the height of the
tree. (if x is a leaf node)
CS 473
Lecture X
20
Determining the Rank of an Element
r
z
Tz
y
x
Ty
 Follow the nodes y (y=x
initially) on the path from x to
root.
 Consider Subtree Tp[y] rooted at
p[y], where z is y’s sibling.
Important: r contains the number of nodes in Ty that
preceedes x.
CS 473
Lecture X
21
Determining the Rank of an Element
r=r+size[z]+1
z
Tz
y
x
Ty
Case 1:
If y is a right child  z is a left child then
• All nodes in Tz and p[y] precedes x
• Thus, must add size[z] + 1 to r
CS 473
Lecture X
22
Determining the Rank of an Element
r=r
y
z
Tz
x
Ty
Case 2:
If y is a left child  z is a right child then
• Neither p[y] nor any node in Tz precedes x.
• Don’t update r
CS 473
Lecture X
23
Maintaining Subtree Sizes
 OS-SELECT and OS-RANK works if we are able to
update the subtree sizes with modifications.
 Two operations INSERT and DELETE modifies the
contents of the Tree. We should try to update subtree
size without extra traversals.
 If not, would have to make a pass over the tree to set up
the sizes whenever the tree is modified, at cost (n)
CS 473
Lecture X
24
Red Black Tree Insertion
 Insertion is a two phase process;
• Phase 1: Insert node. Go down the tree from the
root. Inserting the new node as a child of an existing
node. (Search in O(lgn) time)
• Phase 2: Balance tree and correct colors. Go up the
tree changing colors and ultimately performing
rotations to maintain the R-B properties.
CS 473
Lecture X
25
Maintaining Subtree Sizes in an Insert Operation
 Phase 1:
• Increment size[x] for each node x on the downward path
from root to leaves
• The new added node gets the size of 1
• O(lgn) operation
 Phase 2:
• Only rotations cause structural changes
• At most two rotations
• Good News!!! Rotation is a local operation. Invalidates
only two size fields of the two nodes, around which the
rotation is performed
• O(1) time operation
CS 473
Lecture X
26
Maintaining Subtree Sizes in an Insert Operation
19
x
19
y
LEFT-ROTATE(T,x)
 7
y 11
 6
4
 6

12
 4
x
 7
After the rotation:
size[y]size[x]
size[x]size[left[x]] + size[right[x]] + 1
Note: only size fields of x and y are modified
CS 473
Lecture X
27
Red Black Tree Deletion
Deletion is a two phase process;
 Phase 1:
• Splice out one node y
 Phase 2:
• Performs at most 3 rotations
• Otherwise performs no structural changes
CS 473
Lecture X
28
Maintaining Subtree Sizes in an Delete Operation
 Phase 1:
• Traverse a path from node y up to the root. Decrementing
the size field of each node on the path.
• Length of this path = dy = O(lgn)
• O(lgn) Time Operation
 Phase 2:
• The O(1) Time Rotations can be handled in the same
manner as for insertion.
CS 473
Lecture X
29
Application: Counting the Number of Inversions
 Definition: Let A[1..n] be an array of n distinct numbers.
(i,j) is an inversion of A if i<j and A[i] > A[j]
Inversions of A = <12, 13, 18, 16, 11>



 
1
2 3 4
5
I(A) = {(1,5), (2,5), (3,4), (3,5), (4,5)}
CS 473
Lecture X
30
Application: Counting the Number of Inversions
 Question : What array with elements from the set {1...n}
has the most inversions?
• A = <n, n-1, ... 2, 1>
• number of inversions;
n(n  1)
| I(A) |   i 
2
i 1
n 1
CS 473
Lecture X
31
Application: Counting the Number of Inversions
 Question : What is the relationship between the running
time of insertion sort and the number of inversions
• |I(A)| = # of
inversions = # of element moves (shifts)
n
| I(A) |   i | I j ( A) | where;
i 2
I j (A)  {i : i  j & A[i ]  A[ j ]}  {i : (i, j )  I ( A) & i  j}
n
i.e; I(A)   I j ( A)
j 2
Let r(j) be the rank of A[j] in A[i...j], then;
| I j (A) |  j  r ( j )  r ( j )  j  | I j ( A) |
CS 473
Lecture X
32
Number of inversions
INVERSION(A)
sum0
T 
for j1 to n do
x MAKE-NEW-NODE()
left[x]right[x]p[x]NIL
key[x]A[j]
OS-INSERT(T,x)
rOS-RANK(T,x)
Ijj-r
sumsum+Ij
return sum
CS 473
Lecture X
33