Transcript search

Data Structures &
Algorithms
Binary Search Trees
Search
Long studied problem
Intrinsic problem in much of CS
Business applications
Information retrieval
Caching
Matching problems
Search
Goal: given a key, find an item(s) that is
associated with that key
If search is successful, called a hit.
If not, called a miss.
May also extend to approximate search,
range queries, etc.
Search
Defn. 12.1: A symbol table is a data
structure of items with keys that supports
two basic operations:
• insert an new item, and
• return an item with a given key.
Symbol table is otherwise known as:
• Dictionary
• Associative memory (contrast with
retrieving the item stored at a
particular address)
Search
Examples of symbol tables:
• Telephone book
• Encyclopedia
• Index (vs. table of contents)
• Compiler tables of identifiers
Huge advantage of computer-based
symbol tables:
• Easy to be dynamic
Search
Symbol table ADT interface:
Insert <new item>
Search <key>
Remove <item>
Select <int>
Sort
Join <Symbol table>
May also want construct, destroy,
empty test, copy, search&insert
Search
Symbol table ADT
template <class Item, class Key>
class ST
{
Private: // impl dependent stuff
Public:
ST(int);
int count();
Item search(Key);
void insert(Item);
void remove(Item);
Item select(int);
void show(ostream&);
}
Search
Obligations imposed by symbol table
ADT on Key:
key()
null()
operator==
operator<
Mods to rand, scan, show
Also – missing join()
Interpretations – e.g., remove()
Key-indexed Search
Key values distinct, small numbers
Can use an array implementation
Simply put item with key k in st[k]
To remove, replace with null item
Time = ?
Constant
Select, sort, count all linear
Skip over null items
Client has to deal with duplicate keys
Key-indexed Search
If there are no items, just keys
Use table of bits
Existence table
Bit map
Applications:
Allocation tables for memory
Blum filters
Key-indexed Search
Prop 12.1: If key values are positive
integers less than M and items have
distinct keys, then the symbol-table data
type can be implemented with keyindexed arrays of items such that insert,
search, and remove require constant
time, and initialize, select, and sort
require time proportional to M, whenever
any of the operations are performed on
an N-item table.
Sequential Search
If keys are over too large a range to
used key-indexed approach, can still use
a simple approach:
Store contiguously in an array, either
• In order
• Unordered
Trade-offs?
• Insertion time
• Search time
• Removal time
• Sort time
Sequential Search
If keys are over too large a range to
used key-indexed approach, can still use
a simple approach:
Could also store in linked list, either
In order
Unordered
Trade-offs?
Aside from those depending on
ordering, those from arrays vs.
linked lists: insert/remove/join
times vs. size
Sequential Search
Prop 12.2: Sequential search in a
symbol table with N items uses about
N/2 comparisons for search hits (on
average).
True for any of the four forms (array or
linked list, ordered or unordered).
Sequential Search
Prop 12.2: Sequential search in a
symbol table with N unordered items
uses a constant number of steps for
inserts and N comparisons for search
misses (always).
True for array or linked list.
Sequential Search
Prop 12.3: Sequential search in a
symbol table with N ordered items uses
about N/2 comparisons for insertions,
search hits, and search misses (on
average).
True for array or linked list.
Costs of Insertion and Search
Worst
Case
Worst
Case
Worst
Case
Avg
Case
Avg
Case
Avg
Case
Insert
Search
Select
Insert
Search
Hit
Search
Miss
Key-Indexed
Array
1
1
M
1
1
1
Ordered Array
N
N
1
N/2
N/2
N/2
Ordered
Linked List
N
N
N
N/2
N/2
N/2
Unordered
Array
1
N
N lg N
1
N/2
N
Unordered
Linked List
1
N
N lg N
1
N/2
N
N = number of items, M = size of container
Costs of Insertion and Search
Worst
Case
Worst
Case
Worst
Case
Avg
Case
Avg
Case
Avg
Case
Insert
Search
Select
Insert
Search
Hit
Search
Miss
Binary Search
N
lg N
1
N/2
lg N
lg N
Binary Search
Tree
N
N
N
lg N
lg N
lg N
lg N
lg N
lg N
lg N
lg N
lg N
Randomized
Tree
N*
N*
N*
lg N
lg N
lg N
Hashing
1
N*
N lg N
1
1
1
Red-Black
Tree
N = number of items, M = size of container
Binary Search
Binary search – compare to middle item
in sorted search table, then search
appropriate half.
k?
a
c
f
j
k
m n
p
q
r
s
t
v
x
y
a
c
f
j
k
m n
p
q
r
s
t
v
x
y
a
c
f
j
k
m n
p
q
r
s
t
v
x
y
a
c
f
j
k
m n
p
q
r
s
t
v
x
y
j
hit
k m n
p
q
r
s
t
v
x
y
a
c
f
Binary Search
Prop 12.4: Binary search in a symbol
table with N ordered items never uses
more than [lg N] + 1 comparisons for a
search (hit or miss).
Note that this is the same as the number
of bits in the representation of N.
Why?
Binary Search
Keeping table sorted
• Expensive using insertion sort
• May not matter if table is mostly static,
• or if the number of searches is huge
(relative to insertions).
Improvements:
• Interpolation search (try to guess
where key will fall in ST)
• assumes key values are evenly
distributed
Binary Search Trees
Use an explicit tree structure
Defn 12.2: A binary search tree (BST) is
a binary tree that has a key associated
with each internal node, and the key in a
node is >= the keys in its left subtree
and <= the keys in its right subtree
Binary Search Trees
Defn 12.2: A binary search tree (BST) is
a binary tree that has a key associated
with each internal node, and the key in a
node is >= the keys in its left subtree
and <= the keys in its right subtree
k
<=k
>=k
Binary Search Trees
BST search algorithm sketch
To search a BST T with root h for key k:
If h is null (T is empty) – return miss
If h.key == k – return hit
If k < h.key search left subtree h.left
else search right subtree h.right
Binary Search Trees
BST insert algorithm
To insert a new item X into a BST:
Essentially search, then attach new
node (and just as efficient as search):
If tree empty set X as root and return
If X.key < root.key
insert X into left subtree
Else insert X into right subtree
Binary Search Trees
BST sort algorithm
To sort a BST (i.e., output items stored in
BST in order of keys):
Inorder traversal of BST
If tree empty return
Show left subtree
Show root
Show right subtree
Binary Search Trees
Prop. 12.6: Search hits require about 2
ln N ≈ 1.39 lg N comparisons on
average, in a BST built from random
keys. (Treating successive < and =
operations as one comparison)
Prf: Search hit follows a path from root to
the node with the key; average is
internal path length over all nodes, + 1.
Binary Search Trees
Prop. 12.7: Insertions and search misses
require about 2 ln N ≈ 1.39 lg N
comparisons on average, in a BST built
from random keys. (Treating successive
< and = operations as one comparison)
Prf: Search hit follows a path from root to
an external node; average is external
path length over all nodes.
Binary Search Trees
Prop. 12.8: In the worst case, a search
in a BST with N keys can take N
comparisons.
Prf: The tree may only have one nonempty subtree per internal node.
Index Implementations
May not want to move actual items
around, and may want flexibility in item
contents.
Here, may use items stored in an array
(or parallel arrays), with key extraction,
then use parallel arrays for link (i.e.,
have left[i] and right[i] give indexes for
left and right child, resp.)
Appl: large or overlapping items (e.g.,
text strings in corpus)
BST Insertion
Usually, insertion into BST at bottom
Replace external node
What if need to insert at root, so new
nodes are near the top?
Examine this now, as it will come in
handy soon…..
BST Insertion at Root
Wlog, assume new item has a key larger
than that of the current root.
L
H
D
M
F
B
C
E
N
J
I
K
O
BST Insertion at Root
Tempting to make new node root, make
old root left child, and old right child the
right child of new root:
L
H
D
M
F
B
C
E
N
J
I
K
O
BST Insertion at Root
But if there are items in right subtree
with smaller keys, then this violates the
BST requirements!
L
H
D
M
F
B
C
E
N
J
I
K
O
BST Insertion at Root
Must somehow rearrange tree to reestablish BST property… moving all
nodes with smaller key to left subtree
looks costly!
L
H
D
M
F
B
C
E
N
J
I
K
O
BST Insertion at Root
Solution: ROTATION
L
H
D
M
F
B
C
E
N
J
I
K
O
BST Insertion at Root
Solution: ROTATION (here in BST)
A
N
L
K
R
M
>=N
<=L
>=L &
<=N
BST Insertion at Root
Right Rotation in BST: make left child
new root, make old root right child
A
N
L
K
R
M
>=N
<=L
>=L &
<=N
BST Insertion at Root
Right Rotation in BST: patch up links
from A, to “middle” subtree, betw/L & N
A
L
N
K
M
R
<=L
>=L &
<=N
>=N
BST Insertion at Root
Result is still BST, and requires a
constant number of local changes.
A
L
N
K
M
R
<=L
>=L &
<=N
>=N
BST Insertion at Root
Left rotation is mirror image of right
rotation (reverse the Rrot steps!)
A
L
N
K
M
R
<=L
>=L &
<=N
>=N
BST Insertion at Root
Right rotation is mirror image of left
rotation (reverse the Lrot steps!)
A
N
L
K
M
R
<=L
>=L &
<=N
>=N
BST Insertion at Root
Now we can insert at root by inserting at
leaf, …
L
H
D
M
F
B
C
E
N
J
I
K
O
BST Insertion at Root
Now we can insert at root by inserting at
leaf, then rotating up
H
D
M
F
B
C
E
N
J
I
O
K
L
BST Insertion at Root
Now we can insert at root by inserting at
leaf, then rotating up, and up
H
D
M
F
B
C
E
N
J
L
I
K
O
BST Insertion at Root
Now we can insert at root by inserting at
leaf, then rotating up, and up, and up
H
D
M
F
B
C
N
L
E
O
J
I
K
BST Insertion at Root
Now we can insert at root by inserting at
leaf, then rotating up, and up, and up,
and up
H
L
D
C
E
M
J
F
B
I
K
N
O
BST Insertion at Root
Q: How much does it cost?
A: path length
Q: Why do it?
L
H
M
J
D
N
I
F
B
C
E
K
O
BST Insertion at Root
Q: Why do it?
A: Often do searches for more recent
items – faster if near the root!
L
H
M
J
D
N
I
F
B
C
E
K
O
Other BST Functions
Select k – like quickSort version:
Keep count field with size of subtree
Recursively select in subtrees, adjusting
k if right subtree
L
13
H
9
J
3
D
5
F
2
B
2
C
1
M
3
E
1
I
1
K
1
N
2
O
1
Other BST Functions
Partition: put kth smallest node at root
Select, then rotate up
Remove: delete node and patch up tree
Search (which subtree?)
Replace subtree with one minus node
If root of subtree, delete root and
combine the two subsubtrees
Join: many possible ways, none all that
attractive in general
Other BST Functions
Remove: a lot of work, disrupts structure
No longer random, even if data were
Height tends to sqrt(N) instead of lg N
May just mark deleted nodes
Skip them on search
Use space to insert new items
Periodically remove/restructure to
avoid using up too much space/time
This lazy approch applies to any
structure, not just BSTs or symbol tables!
Balancing BSTs
Balanced trees perform well
lg N time
Near-balanced is good enough
How to provide performance guarantees?
Randomize
Amortize
Optimize
Performance
Randomize
Introduce random decision making
Reduces chance of worst-case … a lot!
Examples
QuickSort – random pivot
Randomized BSTs
Skip Lists
Performance
Amortize
Do extra (expensive) work from time to time
To avoid much more work later
With “cost” spread over all operations
A particular operation may take more time
But time for op sequence is always good
Examples
Array resizing
Splay BSTs
Performance
Optimize
Do work every time
To guarantee performance every time
Usually cumbersome to implement
Examples
AVL trees
Top-down 2-3-4 trees
Red-Black Trees
Randomized BSTs
Recall analysis of average case for BSTs
Assumed items inserted in random order
Thus each node equally likely to be root
But Hey!!
Don’t have to rely on random insertions!
Can get this by randomly choosing to
make new node the root with prob
1/(N+1)
Randomized BSTs
Prop 13.1: Building a randomized BST is
equivalent to building a standard BST
from a random initial permutation of
the keys. We use about 2 N ln N
comparisons to construct a rBST with
N items, and about 2 ln N
comparisons for search.
Note distinction in average-case
performance in rBSTs and BSTs
Randomized BSTs
Note distinction in average-case
performance in rBSTs and BSTs
Average case is about same (constant is
a little higher in rBSTs)
But assumption in BSTs that items are
inserted in random order
Which is often NOT the case
With rBSTs, this does not matter!