1 - University of South Carolina

Download Report

Transcript 1 - University of South Carolina

CSCE350 Algorithms and Data Structure
Lecture 17
Jianjun Hu
Department of Computer Science and Engineering
University of South Carolina
2009.11.
Midterm2
Average 84
>90
12 students
B-Trees
Organize data for fast queries
Index for fast search
For datasets of structured records, B-tree indexing is used
Motivation (cont.)
Assume that we use an AVL tree to store about 20
million records
We end up with a very deep binary tree with lots of
different disk accesses; log2 20,000,000 is about 24,
so this takes about 0.2 seconds
We know we can’t improve on the log n lower bound
on search for a binary tree
But, the solution is to use more branches and thus
reduce the height of the tree!
• As branching increases, depth decreases
B-Trees
4
Constructing a B-tree
17
3
1
2
B-Trees
6
7
8
28
12
14
16
25
5
26
29
48
45
52
53
55
68
Definition of a B-tree
A B-tree of order m is an m-way tree (i.e., a tree where each
node may have up to m children) in which:
1. the number of keys in each non-leaf node is one less than
the number of its children and these keys partition the keys in
the children in the fashion of a search tree
2. all leaves are on the same level
3. all non-leaf nodes except the root have at least m / 2
children
4. the root is either a leaf node, or it has from two to m
children
5. a leaf node contains no more than m – 1 keys
The number m should always be odd
B-Trees
6
Comparing Trees
Binary trees
• Can become unbalanced and lose their good time complexity
(big O)
• AVL trees are strict binary trees that overcome the balance
problem
• Heaps remain balanced but only prioritise (not order) the keys
Multi-way trees
• B-Trees can be m-way, they can have any (odd) number of
children
• One B-Tree, the 2-3 (or 3-way) B-Tree, approximates a
permanently balanced binary tree, exchanging the AVL tree’s
balancing operations for insertion and (more complex) deletion
operations
B-Trees
7
Chapter 8: Dynamic Programming
Dynamic Programming is a general algorithm design
technique
Invented by American mathematician Richard Bellman in the
1950s to solve optimization problems
“Programming” here means “planning”
Main idea:
• solve several smaller (overlapping) subproblems
• record solutions in a table so that each subproblem is only
solved once
• final state of the table will be (or contain) solution
Example: Fibonacci numbers
Recall definition of Fibonacci numbers:
f (0) = 0
f (1) = 1
f (n) = f (n-1) + f (n-2)
Computing the nth Fibonacci number recursively (top-down):
f(n)
f(n-1)
f(n-2)
+
+
f(n-3)
f(n-2)
f(n-3)
...
+
f(n-4)
Example: Fibonacci numbers
Computing the nth Fibonacci number using bottom-up iteration:
f(0) = 0
f(1) = 1
f(2) = 0+1 = 1
f(3) = 1+1 = 2
f(4) = 1+2 = 3
f(5) = 2+3 = 5
…
f(n-2) =
f(n-1) =
f(n) = f(n-1) + f(n-2)
Examples of Dynamic Programming
Algorithms
Computing binomial coefficients
Optimal chain matrix multiplication
Constructing an optimal binary search tree
Warshall’s algorithm for transitive closure
Floyd’s algorithms for all-pairs shortest paths
Some instances of difficult discrete optimization
problems:
• travelling salesman
• knapsack
The problem: check if all pair have a valid
path
a
b
c
d
You can run depth first search from each vertex…
But can u do better?
Warshall’s Algorithm: Transitive Closure
• Computes the transitive closure of a relation
• (Alternatively: all paths in a directed graph)
• Example of transitive closure:
3
3
1
1
2
4
0
1
0
0
0
0
0
1
1
0
0
0
0
1
0
0
2
4
0
1
0
1
0
1
0
1
1
1
0
1
0
1
0
1
Warshall’s Algorithm
Main idea: a path exists between two vertices i, j, iff
•there is an edge from i to j; or
•there is a path from i to j going through vertex 1;
or
•there is a path from i to j going through vertex 1
and/or 2; or
•there is a path from i to j going through vertex 1,
2, and/or 3; or
•...
•there is a path from i to j going through any of the
other vertices
Warshall’s Algorithm
3
3
3
1
1
1
4
2
0
1
0
0
R0
0 1
0 0
0 0
1 0
0
1
0
0
4
2
R1
0 0
1 0
0 0
0 1
1
1
0
0
0
1
0
0
3
3
1
1
4
2
R2
0 0
1 0
0 0
1 1
1
1
0
1
0
1
0
1
4
2
R3
0 0
1 0
0 0
1 1
1
1
0
1
0
1
0
1
4
2
R4
0 0
1 1
0 0
1 1
1
1
0
1
0
1
0
1
Warshall’s Algorithm
In the kth stage determine if a path exists between two vertices i,
j using just vertices among 1,…,k
{
R(k)[i,j] =
R(k-1)[i,j]
(path using just 1 ,…,k-1)
or
(R(k-1)[i,k] and R(k-1)[k,j]) (path from i to k
and from k to i
using just 1 ,…,k-1)
k
i
kth stage
j
Pseudocode of Warshall’s Algorithm
ALGORITHM Warshall ( A[1..n,1..n ])
R( 0)  A
for k  1 to n do
for i  1 to n do
for j  1 to n do
R ( k ) [i, j ]  R ( k 1) [i, j ] or ( R ( k 1) [i, k ] and R ( k 1) [k , j ])
return R ( n )
Example
a
b
c
d
Time efficiency?
Intuitive understanding of Warshall algo
Can the above Warshall algorithm find all possible paths?
Does Warshall algorithm avoid duplicate updates?
i=6 9  12  3  7  5=j
69
912
123
37
75=j
Floyd’s Algorithm: All pairs shortest paths
In a weighted graph, find shortest paths
between every pair of vertices
Same idea: construct solution through series
of matrices D(0), D(1), … using an initial subset
of the vertices as intermediaries.
1
7
6
3
3
1
4
1
2
3 4
1
2
3
4
1
0

3

2
2
0
 
2 2
0
5 6
3 
7
0
1
3 7
7
0 1
4
 
0
4 6 16 9 0
2
2
6
1 0 10 3 4
Similar to Warshall’s Algorithm
dij(k ) in D (k ) is equal to the length of shortest path among all
paths from the ith vertex to jth vertex with each intermediate
vertex, if any, numbered not higher than k
(k-1)
dij
vj
vi
(k-1)
(k-1)
di k
dk j
vk
dij( k )  min{ dij( k 1) , dik( k 1)  dkj( k 1) } for k  1, dij( 0)  wij
Pseudocode of Floyd’s Algorithm
The next matrix in sequence can be written over its
predecessor
ALGORITHM Floyd (W [1..n,1..n ])
D W
for k  1 to n do
for i  1 to n do
for j  1 to n do
D[i, j ]  min{ D[i, j ], D[i, k ]  D[k , j ]}
return D
Example
1
2
2
7
3
6
3
Time efficiency?
1
4
Principle of Optimality
An optimal solution to any instance of a optimization problem
is composed of optimal solution to its subinstance
e.g.
Shortest path between a and b… then any distance of two
points S, T on the path from a to b must also be shortest….
Why?
Otherwise, we can choose another path S’ and T’!
Key issues in DP algorithm design
Derive a recurrence relationship that expresses a solution to
an instance of the problem in terms of solutions to its smaller
subinstances
Design Algo for Sequence Alignment Problem
using DP
ACTGGAGGCCA
AGCTAGAGAAG
Score= Sum(si)
match: +3
Mismatch:-2
gap-X -1