COMP171H Notes: Hashing
Download
Report
Transcript COMP171H Notes: Hashing
COMP171
Spring 2009
Course Review
Hashing / Slide 2
Elementary Data Structures
Linked lists
Types: singular, doubly, circular
Operations: insert, find, delete O(N)
Stack ADT (array or linked-list)
First-in-Last-out
Operations: push, pop
Queue ADT (array or linked-list)
First-in-First-out
Operations: enqueue, dequeue
Priority queue (heap)
Heap-order property
Operations: insert and deleteMin (both O(log N) using array)
Hashing / Slide 3
Analysis of Algorithms
Worst case analysis
time and memory space
depends on the input (size, special features) and algorithms
growth rate of the functions
Asymptotic notations
O(.): upper bound, “at most”, worst case, ≤
Ω(.): lower bound, “at least”, best case, ≥
Θ(.): tight bound, O(.) && Ω(.), asymptotically “≈” growth rate
Hashing / Slide 4
Comparison-based Sorting
Algorithms
Algorithm
s
Method
Worst time Average
time
Memor
y
Insertion
sort
A[1…p-1] are
sorted, then
insert A[p]
O(N²)
O(N +
#Inversions)
O(1)
Mergesort
Divide, conquer
and merge
O(N logN)
O(N logN)
O(N)
Quicksort
Pick pivot,
partition,
recursion
O(N²)
O(N logN)
O(logN)
Heapsort
Build heap,
deleteMin
O(N logN)
O(N logN)
O(1)
Hashing / Slide 5
Other Sorting Algorithms
Theorem: Any comparison based sorting algorithm
takes Ω(n logn) comparisons to sort a list of n distinct
elements.
Non-comparison sorting
Rely on some special features of the input
Counting sort: O(N + Range)
Radix sort: O(N × #digits)
Hashing / Slide 6
Overview of the Forrest
Binary Trees
Binary Search Trees
AVL trees
M-ary Trees
Terms:
B+-Trees
root, leaves, child & parent, siblings, internal nodes,
height, paths & length, depth,
subtrees
Main theme: search, insertion, deletion.
Hashing / Slide 7
Binary Search Trees
Traversal
Pre-order, in-order, post-order
Average depth O(log N), max depth O(N),
Operations:
Search, findMin, findMax
Insertion
Deletion: 3 cases, recursive
All takes O(height of the tree).
Hashing / Slide 8
AVL Trees
Balanced BST
Height of an AVL tree is O(log N).
Search O(log N)
Insertion: 2 types of rebalancing O(log N)
Trace from the new node to the root, locate the unbalanced
node, and at most one rotation.
Single rotation for “outside” insertion
Double rotation for “inside” insertion
Deletion O(log N)
Trace the deletion path, and more than one node may need
rotation.
Hashing / Slide 9
B+-Trees
M-ary trees
The root is either a leaf or has 2 to M children (1 to M-1 keys).
Each internal node, except the root, has between M/2 and M children ( M/2 - 1 and M - 1
keys).
Each leaf has between L/2 and L keys and corresponding data items.
The data items are stored at leaves
Search
Insertion: always insert to a leaf
Possibly splitting leaf (and internal nodes)
Update the key in a internal node if necessary
Deletion
“Borrow” a key from your siblings
Merge two leaves (or internal nodes) – opposite to splitting
Hashing / Slide 10
Graph Essentials
Graph = Vertices + Edges
Representation
Adjacency matrix: O(N2) space, O(1) access time
Adjacency list: O(M+N) space, O(N) access time
Keywords
subgraph, tree, acyclic graph
path, length, cycle, simple
directed, undirected
incident, degree, indegree / outdegree
connected components
Hashing / Slide 11
Graph Traversal: BFS
Connectivity and shortest path (from source)
Time: O(M+N)
Use visited table,
predecessor list,
and distance table
BFS tree
Hashing / Slide 12
Graph Traversal: DFS
Connectivity and cycle detection
Time: O(M+N)
Recursive function
DFS tree
Hashing / Slide 13
Topological Sort
Topological ordering on DAG and connectivity
Time: O(M+N)
Repeatedly remove
zero-degreed vertices
and the outgoing arcs
Linear ordering
Hashing / Slide 14
Hashing
Hashing is a function that maps a key (K) into an entry
in a table (hash table), T[h(K)]. It only supports
operations Find, insertion, deletion
The hashing function should be
Computationally fast, and efficient with O(1) complexity
Minimize the collisions, when T[h(K1)] = T[h)K2)] and K1 and
K2 are two distinctive keys
Collision Resolution 1: Separate chaining
Collision Resolution 2: Open Addressing
Linear Probing
Quadratic Probing
Double hashing