Transcript slides

Universal Hashing
• When attempting to foil an malicious
adversary, randomize the algorithm
• Universal hashing: pick a hash function
randomly when the algorithm begins
– Guarantees good performance on average, no
matter what keys adversary chooses
– Need a family of hash functions to choose from
– Think of quick-sort
Universal Hashing
• Let G be a (finite) collection of hash functions
– …that map a given universe U of keys…
– …into the range {0, 1, …, m - 1}.
 G is said to be universal if:
– for each pair of distinct keys x, y  U,
the number of hash functions h  G
for which h(x) = h(y) is |G|/m
– In other words:
• With a random hash function from G the chance of a collision
between x and y is exactly 1/m (x  y)
Universal Hashing
• Theorem 11.3:
– Choose h from a universal family of hash functions
– Hash n keys into a table of m slots, n  m
– Then the expected number of collisions involving a
particular key x is less than 1
– Proof:
• For each pair of keys y, z, let cyx = 1 if y and z collide, 0
otherwise
• E[cyz] = 1/m (by definition)
• Let Cx be total number of collisions involving key x
n 1
•
E[Cx ]   E[cxy ] 
yT
yx
• Since n  m, we have E[Cx] < 1
m
A Universal Hash Function
• Choose table size m to be prime
• Decompose key x into r+1 bytes, so that
x = {x0, x1, …, xr}
– Only requirement is that max value of byte < m
– Let a = {a0, a1, …, ar} denote a sequence of r+1
elements chosen randomly from {0, 1, …, m - 1}
– Define corresponding hash function ha  G:
r
ha  x    ai xi mod m
i 0
– With this definition, G has mr+1 members
A Universal Hash Function
 G is a universal collection of hash functions
(Theorem 11.5)
• How to use:
– Pick r based on m and the range of keys in U
– Pick a hash function by (randomly) picking the
a’s
– Use that hash function on all keys
Example
• Let m = 5, and the size of each string is 2 bits
(binary). Note the maximum value of a string is 3
and m = 5
• a = 1,3, chosen at random from 0,1,2,3,4
• Example for x = 4 = 01,00 (note r = 1)
• ha(4) = 1  (01) + 3  (00) = 1
Open Addressing
• Basic idea (details in Section 12.4):
– To insert: if slot is full, try another slot, …,
until an open slot is found (probing)
– To search, follow same sequence of probes as
would be used when inserting the element
• If reach element with correct key, return it
• If reach a NULL pointer, element is not in table
• Good for fixed sets (adding but no deletion)
• Table needn’t be much bigger than n
Lecture 11:
Binary Search Trees
Shang-Hua Teng
Data Format
Keys
Entry
Keys
Satellite data
Insertion and Deletion on
dynamic sets
• Insert(S,x)
– A modifying operation that augments the set S
with the element pointed by x
• Delete
– Given a pointer x to an element in the set S,
removes x from S
– Notice that this operation uses a pointer to an
element x, not a key value
Querying on dynamic sets
• Search(S,k)
– given a set S and a key value k, returns a pointer x to an element in S such
that key[x] = k, or NIL if no such element belongs S
• Minimum(S)
– on a totally ordered set S that returns a pointer to the element S with the
smallest key
• Maximum(S)
• Successor(S,x)
– Given an element x whose key is from a totally ordered set S, returns a
pointer to the next larger element in S, or NIL if x is the maximum
element
• Predecessor(S,x)
Trees
root
Edge
Degree?
interior node
parent
Node
Depth/Level?
path
subtree
leaf
child
Height?
Binary Tree
• Node
–
–
–
–

Parent
data
left child
right child
parent (optional)
Tree

root
Data
Left
Right
root
Trees
• Full tree of height h
– all leaves present at level h
– all interior nodes full
– total number of nodes in a full binary tree?
• Complete tree of height h
Applications - Expression Trees
To represent infix expressions
+
*
5
-
3
8
(5*3)+(8-4)
4
Applications - Parse Trees
Used in compilers to check syntax
statement
if
cond
then
if
statement
cond
then
else
statement
statement
Application - Game Trees
Binary Search Trees
• Binary Search Trees (BSTs) are an
important data structure for dynamic sets
• In addition to satellite data, elements have:
– key: an identifying field inducing a total
ordering
– left: pointer to a left child (may be NULL)
– right: pointer to a right child (may be NULL)
– p: pointer to a parent node (NULL for root)
Binary Search Trees
• BST property:
key[leftSubtree(x)]  key[x]  key[rightSubtree(x)]
• Example:
F
B
A
H
D
K
Traversals for a Binary Tree
• Pre order
– visit the node
– go left
– go right
• In order
– go left
– visit the node
– go right
• Post order
– go left
– go right
– visit the node
• Level order / breadth first
– for d = 0 to height
• visit nodes at level d
Traversal Examples
Pre order
A
ABDGHCEFI
In order
B
D
G
C
E
H
GDHBAECFI
Post order
F
GHDBEIFCA
I
Level order
ABCDEFGHI
Traversal Implementation
• recursive implementation of preorder
– base case?
– self reference
• visit node
• pre-order(left child)
• pre-order(right child)
• What changes need to be made for in-order,
post-order?
Inorder Tree Walk
• In order
TreeWalk(x)
TreeWalk(left[x]);
print(x);
TreeWalk(right[x]);
• Prints elements in sorted (increasing) order
In order Tree Walk
F
• Example:
B
A
H
D
K
• How long will a tree walk take?
• In order walk prints in monotonically increasing
order. Why?
Evaluating an expression tree
• Walk the tree in postorder
• When visiting a node, use the results
of its children to evaluate it.
+
*
5
-
3
8
4
Operations on BSTs: Search
• Given a key and a pointer to a node, returns an
element with that key or NULL:
TreeSearch(x, k)
if (x = NULL or k = key[x])
return x;
if (k < key[x])
return TreeSearch(left[x], k);
else
return TreeSearch(right[x], k);
BST Search: Example
• Search for D and C:
F
B
A
H
D
K
Operations of BSTs: Insert
• Adds an element x to the tree so that the binary
search tree property continues to hold
• The basic algorithm
– Like the search procedure above
– Insert x in place of NULL
– Use a “trailing pointer” to keep track of where you
came from (like inserting into singly linked list)
BST Insert: Example
• Example: Insert C
B
F
H
A
D
C
K
BST Search/Insert: Running
Time
• What is the running time of TreeSearch() or
TreeInsert()?
– O(h), where h = height of tree
• What is the height of a binary search tree?
– worst case: h = O(n) when tree is just a linear string of
left or right children
• We’ll keep all analysis in terms of h for now
• Later we’ll see how to maintain h = O(lg n)
Sorting With Binary Search Trees
• Informal code for sorting array A of length n:
BSTSort(A)
for i=1 to n
TreeInsert(A[i]);
InorderTreeWalk(root);
• Argue that this is (n lg n)
• What will be the running time in the
– Worst case?
– Average case? (hint: remind you of anything?)
Sorting With BSTs
• Average case analysis
for i=1 to n
TreeInsert(A[i]);
InorderTreeWalk(root);
– It’s a form of quicksort!
3 1 8 2 6 7 5
1 2
3
8 6 7 5
2
6 7 5
5
1
8
2
7
6
5
7
Sorting with BSTs
• Same partitions are done as with quicksort,
but in a different order
– In previous example
• Everything was compared to 3 once
• Then those items < 3 were compared to 1 once
• Etc.
– Same comparisons as quicksort, different order!
Sorting with BSTs
• Since run time is proportional to the number
of comparisons, same expected time as
quicksort: O(n lg n)
• Which do you think is better, quicksort or
BSTsort? Why?
Sorting with BSTs
• Since run time is proportional to the number of
comparisons, same time as quicksort: O(n lg n)
• Which do you think is better, quicksort or
BSTSort? Why?
• Answer: quicksort
– Better constants
– Sorts in place
– Doesn’t need to build data structure
More BST Operations
• A priority queue supports
– Insert
– Minimum
– Extract-Min
• BSTs are good for more than sorting. For
example, can implement a priority queue
BST Operations: Minimum
• How can we implement a Minimum()
query?
• What is the running time?
– O(h)
BST Operations: Successor
• For deletion, we will need a Successor()
operation
• What is the successor of node 3? Node 15?
Node 13?
• What are the general rules for finding the
successor of node x? (hint: two cases)
BST Operations: Successor
• Two cases:
– x has a right subtree: successor is minimum
node in right subtree
– x has no right subtree: successor is first
ancestor of x whose left child is also ancestor of
x
• Intuition: As long as you move to the left up the tree,
you’re visiting smaller nodes.
• Predecessor: similar algorithm
BST Operations: Delete
• Deletion is a bit tricky
• 3 cases:
– x has no children:
• Remove x
F
B
H
A
– x has one child:
• Splice out x
D
C
– x has two children:
• Swap x with successor
• Perform case 1 or 2 to delete it
K
Example: delete K
or H or B
BST Operations: Delete
• Why will case 2 always go to case 0 or case 1?
– because when x has 2 children, its successor is the
minimum in its right subtree
• Could we swap x with predecessor instead of
successor?
– Of course
Next Lecture
• Up next: guaranteeing an O(lg n) height tree
Animation
• Animated Binary Tree