COMP171H Notes: Hashing

Download Report

Transcript COMP171H Notes: Hashing

COMP171
Spring 2009
Course Review
Hashing / Slide 2
Elementary Data Structures




Linked lists
 Types: singular, doubly, circular
 Operations: insert, find, delete O(N)
Stack ADT (array or linked-list)
 First-in-Last-out
 Operations: push, pop
Queue ADT (array or linked-list)
 First-in-First-out
 Operations: enqueue, dequeue
Priority queue (heap)
 Heap-order property
 Operations: insert and deleteMin (both O(log N) using array)
Hashing / Slide 3
Analysis of Algorithms

Worst case analysis




time and memory space
depends on the input (size, special features) and algorithms
growth rate of the functions
Asymptotic notations



O(.): upper bound, “at most”, worst case, ≤
Ω(.): lower bound, “at least”, best case, ≥
Θ(.): tight bound, O(.) && Ω(.), asymptotically “≈” growth rate
Hashing / Slide 4
Comparison-based Sorting
Algorithms
Algorithm
s
Method
Worst time Average
time
Memor
y
Insertion
sort
A[1…p-1] are
sorted, then
insert A[p]
O(N²)
O(N +
#Inversions)
O(1)
Mergesort
Divide, conquer
and merge
O(N logN)
O(N logN)
O(N)
Quicksort
Pick pivot,
partition,
recursion
O(N²)
O(N logN)
O(logN)
Heapsort
Build heap,
deleteMin
O(N logN)
O(N logN)
O(1)
Hashing / Slide 5
Other Sorting Algorithms

Theorem: Any comparison based sorting algorithm
takes Ω(n logn) comparisons to sort a list of n distinct
elements.

Non-comparison sorting



Rely on some special features of the input
Counting sort: O(N + Range)
Radix sort: O(N × #digits)
Hashing / Slide 6
Overview of the Forrest

Binary Trees

Binary Search Trees
AVL trees

M-ary Trees


Terms:




B+-Trees
root, leaves, child & parent, siblings, internal nodes,
height, paths & length, depth,
subtrees
Main theme: search, insertion, deletion.
Hashing / Slide 7
Binary Search Trees

Traversal

Pre-order, in-order, post-order
Average depth O(log N), max depth O(N),
 Operations:





Search, findMin, findMax
Insertion
Deletion: 3 cases, recursive
All takes O(height of the tree).
Hashing / Slide 8
AVL Trees

Balanced BST

Height of an AVL tree is O(log N).
Search O(log N)
 Insertion: 2 types of rebalancing O(log N)





Trace from the new node to the root, locate the unbalanced
node, and at most one rotation.
Single rotation for “outside” insertion
Double rotation for “inside” insertion
Deletion O(log N)

Trace the deletion path, and more than one node may need
rotation.
Hashing / Slide 9
B+-Trees
M-ary trees





The root is either a leaf or has 2 to M children (1 to M-1 keys).
Each internal node, except the root, has between M/2 and M children ( M/2 - 1 and M - 1
keys).
Each leaf has between L/2 and L keys and corresponding data items.
The data items are stored at leaves
Search
Insertion: always insert to a leaf




Possibly splitting leaf (and internal nodes)
Update the key in a internal node if necessary
Deletion



“Borrow” a key from your siblings
Merge two leaves (or internal nodes) – opposite to splitting
Hashing / Slide 10
Graph Essentials
Graph = Vertices + Edges
 Representation




Adjacency matrix: O(N2) space, O(1) access time
Adjacency list: O(M+N) space, O(N) access time
Keywords





subgraph, tree, acyclic graph
path, length, cycle, simple
directed, undirected
incident, degree, indegree / outdegree
connected components
Hashing / Slide 11
Graph Traversal: BFS
Connectivity and shortest path (from source)
 Time: O(M+N)
 Use visited table,
predecessor list,
and distance table
 BFS tree

Hashing / Slide 12
Graph Traversal: DFS
Connectivity and cycle detection
 Time: O(M+N)
 Recursive function
 DFS tree

Hashing / Slide 13
Topological Sort
Topological ordering on DAG and connectivity
 Time: O(M+N)
 Repeatedly remove
zero-degreed vertices
and the outgoing arcs
 Linear ordering

Hashing / Slide 14
Hashing

Hashing is a function that maps a key (K) into an entry
in a table (hash table), T[h(K)]. It only supports
operations Find, insertion, deletion

The hashing function should be


Computationally fast, and efficient with O(1) complexity
Minimize the collisions, when T[h(K1)] = T[h)K2)] and K1 and
K2 are two distinctive keys
Collision Resolution 1: Separate chaining
 Collision Resolution 2: Open Addressing




Linear Probing
Quadratic Probing
Double hashing