FinalExamReviewS07

Download Report

Transcript FinalExamReviewS07

Analysis of Algorithms
CS 477/677
Final Exam Review
Instructor: George Bebis
The Heap Data Structure
• Def: A heap is a nearly complete binary tree with
the following two properties:
– Structural property: all levels are full, except
possibly the last one, which is filled from left to right
– Order (heap) property: for any node x
Parent(x) ≥ x
8
7
5
4
2
Heap
2
Array Representation of Heaps
• A heap can be stored as an
array A.
– Root of tree is A[1]
– Parent of A[i] = A[ i/2 ]
– Left child of A[i] = A[2i]
– Right child of A[i] = A[2i + 1]
– Heapsize[A] ≤ length[A]
• The elements in the subarray
A[(n/2+1) .. n] are leaves
• The root is the max/min
element of the heap
A heap is a binary tree that is filled in order
3
Operations on Heaps
(useful for sorting and priority queues)
– MAX-HEAPIFY
O(lgn)
– BUILD-MAX-HEAP
O(n)
– HEAP-SORT
O(nlgn)
– MAX-HEAP-INSERT
O(lgn)
– HEAP-EXTRACT-MAX
O(lgn)
– HEAP-INCREASE-KEY
O(lgn)
– HEAP-MAXIMUM
O(1)
– You should be able to show how these algorithms
perform on a given heap, and tell their running time
4
Lower Bound for Comparison Sorts
Theorem: Any comparison sort algorithm requires
(nlgn) comparisons in the worst case.
Proof: How many leaves does the tree have?
– At least n! (each of the n! permutations if the input appears as
some leaf)  n!
– At most 2h leaves
h
 n! ≤ 2h
 h ≥ lg(n!) = (nlgn)
leaves
5
Linear Time Sorting
• Any comparison sort will take at least nlgn to sort an
array of n numbers
• We can achieve a better running time for sorting if we
can make certain assumptions on the input data:
– Counting sort: each of the n input elements is an integer in the
range [0, r] and r=O(n)
– Radix sort: the elements in the input are integers represented
as d-digit numbers in some base-k where d=Θ(1) and k =O(n)
– Bucket sort: the numbers in the input are uniformly distributed
over the interval [0, 1)
6
Analysis of Counting Sort
Alg.: COUNTING-SORT(A, B, n, k)
1.
for i ← 0 to r
(r)
2.
do C[ i ] ← 0
3.
for j ← 1 to n
(n)
4.
do C[A[ j ]] ← C[A[ j ]] + 1
5.
C[i] contains the number of elements equal to i
6.
for i ← 1 to r
(r)
7.
do C[ i ] ← C[ i ] + C[i -1]
8.
C[i] contains the number of elements ≤ i
9.
for j ← n downto 1
(n)
10.
do B[C[A[ j ]]] ← A[ j ]
11.
C[A[ j ]] ← C[A[ j ]] - 1
Overall time: (n + r)
7
RADIX-SORT
Alg.: RADIX-SORT(A, d)
for i ← 1 to d
do use a stable sort to sort array A on digit i
• 1 is the lowest order digit, d is the highest-order digit
(d(n+k))
8
Analysis of Bucket Sort
Alg.: BUCKET-SORT(A, n)
for i ← 1 to n
O(n)
do insert A[i] into list B[nA[i]]
for i ← 0 to n - 1
do sort list B[i] with quicksort sort
(n)
concatenate lists B[0], B[1], . . . , B[n -1]
together in order
O(n)
return the concatenated lists
(n)
9
Hash Tables
Direct addressing (advantages/disadvantages)
Hashing
– Use a function h to compute the slot for each key
– Store the element (or a pointer to it) in slot h(k)
Advantages of hashing
– Can reduce storage requirements to (|K|)
– Can still get O(1) search time in the average case
10
Hashing with Chaining
•
•
•
•
How is the main idea?
Practical issues?
Analysis of INSERT, DELETE
Analysis of SEARCH
– Worst case
– Average case
(both successful and unsuccessful)
(1   )
11
Designing Hash Functions
• The division method
h(k) = k mod m
Advantage: fast, requires only
one operation
Disadvantage: certain values
of m give are bad (powers of 2)
• The multiplication method
h(k) = m (k A mod 1)
Disadvantage: Slower than
division method
Advantage: Value of m is not
critical: typically 2p
• Universal hashing
– Select a hash function at random,
from a carefully designed class of
Advantage: provides good
results on average,
independently of the keys
to be stored
functions
12
Open Addressing
• Main idea
• Different implementations
– Linear probing
– Quadratic probing
– Double hashing
• Know how each one of them works and their
main advantages/disadvantages
– How do you insert/delete?
– How do you search?
– Analysis of searching
13
Binary Search Tree
• Tree representation:
– A linked data structure in which
each node is an object
5
• Binary search tree property:
– If y is in left subtree of x,
then key [y] ≤ key [x]
3
2
7
5
9
– If y is in right subtree of x,
then key [y] ≥ key [x]
14
Operations on Binary Search Trees
– SEARCH
O(h)
– PREDECESSOR
O(h)
– SUCCESOR
O(h)
– MINIMUM
O(h)
– MAXIMUM
O(h)
– INSERT
O(h)
– DELETE
O(h)
– You should be able to show how these algorithms
perform on a given binary search tree, and tell their
running time
15
Red-Black-Trees Properties
•
Binary search trees with additional properties:
1. Every node is either red or black
2. The root is black
3. Every leaf (NIL) is black
4. If a node is red, then both its children are black
5. For each node, all paths from the node to
descendant leaves contain the same number of
black nodes
16
Properties of Red-Black-Trees
• Any node with height h has black-height ≥
h/2
• The subtree rooted at any node x contains
at least 2bh(x) - 1 internal nodes
• No path is more than twice as long as any
other path  the tree is balanced
– Longest path: h <= 2bh(root)
– Shortest path: bh(root)
17
Upper bound on the height of
Red-Black-Trees
Lemma: A red-black tree with n internal nodes has
height at most 2lg(n + 1).
height(root) = h root
bh(root) = b
Proof:
r
l
≥ 2h/2 - 1
≥ 2b - 1
n
number n
of internal
nodes
since b  h/2
• Add 1 to both sides and then take logs:
n + 1 ≥ 2b ≥ 2h/2
lg(n + 1) ≥ h/2 
h ≤ 2 lg(n + 1)
18
Operations on Red-Black Trees
– SEARCH
O(h)
– PREDECESSOR
O(h)
– SUCCESOR
O(h)
– MINIMUM
O(h)
– MAXIMUM
O(h)
– INSERT
O(h)
– DELETE
O(h)
• Red-black-trees guarantee that the height of the
tree will be O(lgn)
• You should be able to show how these algorithms perform on a given
red-black tree (except for delete), and tell their running time
19
Adj. List - Adj. Matrix Comparison
Graph representation: adjacency list, adjacency matrix
Comparison
Better
Faster to test if (x, y) exists?
matrices
Faster to find vertex degree?
lists
Less memory on sparse
graphs?
Faster to traverse the
graph?
lists (m+n) vs. n2
lists (m+n) vs. n2
Adjacency list representation is better for most applications
20
Minimum Spanning Trees
Given:
•
A connected, undirected, weighted graph G = (V, E)
8
b
4
A minimum spanning tree:
1. T connects all vertices
d
9
i
g
14
4
6
7
8
2. w(T) = Σ(u,v)T w(u, v) is minimized
7
2
11
a
c
1
e
10
g
2
f
21
Correctness of MST Algorithms
(Prim’s and Kruskal’s)
• Let A be a subset of some MST (i.e., T), (S, V - S) be a
cut that respects A, and (u, v) be a light edge crossing
(S, V-S). Then (u, v) is safe for A .
Proof:
• Let T be an MST that includes A
S
– edges in A are shaded
• Case1: If T includes (u,v), then
it would be safe for A
• Case2: Suppose T does not include
the edge (u, v)
• Idea: construct another MST T’
that includes A  {(u, v)}
u
v
V-S
22
PRIM(V, E, w, r)
1.
Q← 
2.
for each u  V
Total time: O(VlgV + ElgV) = O(ElgV)
O(V) if Q is implemented
as a min-heap
3.
do key[u] ← ∞
4.
π[u] ← NIL
5.
INSERT(Q, u)
6.
DECREASE-KEY(Q, r, 0)
7.
while Q  
Executed |V| times
8.
do u ← EXTRACT-MIN(Q)
9.
for each v  Adj[u]
10.
11.
12.
O(lgV)
► key[r] ← 0
Takes O(lgV)
Executed O(E) times
do if v  Q and w(u, v) < key[v]
then π[v] ← u
Min-heap
operations:
O(VlgV)
Constant
O(ElgV)
Takes O(lgV)
DECREASE-KEY(Q, v, w(u, v))
23
KRUSKAL(V, E, w)
1.
2.
3.
4.
5.
6.
7.
8.
9.
A← 
for each vertex v  V
O(V)
do MAKE-SET(v)
sort E into non-decreasing order by w O(ElgE)
for each (u, v) taken from the sorted list O(E)
do if FIND-SET(u)  FIND-SET(v)
then A ← A  {(u, v)}
O(lgV)
UNION(u, v)
return A
Running time: O(V+ElgE+ElgV)=O(ElgE) – dependent on
the implementation of the disjoint-set data structure
24
Shortest Paths Problem
• Variants of shortest paths problem
• Effect of negative weights/cycles
• Notation
– d[v]: estimate
– δ(s, v): shortest-path weight
• Properties
– Optimal substructure theorem
– Triangle inequality
– Upper-bound property
– Convergence property
– Path relaxation property
25
Relaxation
• Relaxing an edge (u, v) = testing whether we
can improve the shortest path to v found so far
by going through u
If d[v] > d[u] + w(u, v)
we can improve the shortest path to v
 update d[v] and [v]
After relaxation:
u
5
2
v
9
u
5
2
RELAX(u, v, w)
u
5
2
v
d[v]  d[u] + w(u, v)
6
RELAX(u, v, w)
v
u
7
5
2
v
6
26
Single Source Shortest Paths
• Bellman-Ford Algorithm
– Allows negative edge weights
– TRUE if no negative-weight cycles are reachable from
the source s and FALSE otherwise
– Traverse all the edges |V – 1| times, every time
performing a relaxation step of each edge
• Dijkstra’s Algorithm
– No negative-weight edges
– Repeatedly select a vertex with the minimum
shortest-path estimate d[v] – uses a queue, in which
keys are d[v]
27
BELLMAN-FORD(V, E, w, s)
1.
2.
3.
4.
5.
6.
7.
8.
INITIALIZE-SINGLE-SOURCE(V, s)
for i ← 1 to |V| - 1
do for each edge (u, v)  E
do RELAX(u, v, w)
for each edge (u, v)  E
do if d[v] > d[u] + w(u, v)
then return FALSE
return TRUE
(V)
O(V)
O(E)
O(VE)
O(E)
Running time: O(V+VE+E)=O(VE)
28
Dijkstra (G, w, s)
1.
INITIALIZE-SINGLE-SOURCE(V, s)
2.
S← 
3.
Q ← V[G]
4.
while Q  
5.
(V)
O(V) build min-heap
Executed O(V) times
do u ← EXTRACT-MIN(Q)
O(lgV)
6.
S ← S  {u}
7.
for each vertex v  Adj[u]
8.
do RELAX(u, v, w)
9.
Update Q (DECREASE_KEY)
O(VlgV)
O(E) times
(total)
O(ElgV)
Running time: O(VlgV + ElgV) = O(ElgV)
O(lgV)
29
Correctness
• Bellman-Ford’s Algorithm: Show that d[v]= δ
(s, v), for every v, after |V-1| passes.
• Dijkstra’s Algorithm: For each vertex u  V, we
have d[u] = δ(s, u) at the time when u is added
to S.
30
NP-completeness
•
•
•
•
•
•
•
•
•
Algorithmic vs Problem Complexity
Class of “P” problems
Tractable/Intractable/Unsolvable problems
NP algorithms and NP problems
P=NP ?
Reductions and their implication
NP-completeness and examples of problems
How do we prove a problem NP-complete?
Satisfiability problem and its variations
31