- Cal State LA - Instructional Web Server
Download
Report
Transcript - Cal State LA - Instructional Web Server
Space Efficient Alignment
Algorithms
Dr. Nancy Warter-Perez
June 24, 2005
Outline
Algorithm complexity
Complexity of dynamic programming
alignment algorithms
Hirschberg’s Divide and Conquer algorithm
June 24, 2005
Space Efficient Alignment Algorithms
2
Algorithm Complexity
Indicates the space and time (computational) efficiency
of a program
Generally written in Big-O notation
O represents the complexity (order)
n represents the size of the data set
Examples
Space complexity refers to how much memory is required to
execute the algorithm
Time complexity refers to how long it will take to execute
(compute) the algorithm
O(n) – “order n”, linear complexity
O(n2) – “order n squared”, quadratic complexity
Constants and lower orders ignored
O(2n) = O(n) and O(n2 + n + 1) = O(n2)
June 24, 2005
Space Efficient Alignment Algorithms
3
Complexity of Dynamic Programming
Algorithms for Global/Local Alignment
Time complexity – O(m*n)
For each cell in the score matrix, perform 3
operations
Space complexity – O(m*n)
Compute Up, Left, and Diagonal scores
O(3*m*n) = O(m*n)
Size of scoring matrix = m*n
Size of trace back matrix = m*n
O(2*m*n) = O(m*n)
Where, m and n are the lengths of the
sequences being aligned.
Since m n, O(n2 ) – quadratic complexity!
June 24, 2005
Space Efficient Alignment Algorithms
4
Memory Requirements
For a sequence of 200-500 amino acids
or nucleotides
O(n2) = 5002 = 250,000
If store each score as a 32-bit value = 4
bytes, it requires 1,000,000 bytes to
represent the scoring matrix!
If store each trace back symbol as a
character (8-bit value), it requires 250,000
bytes to represent the trace back matrix
June 24, 2005
Space Efficient Alignment Algorithms
5
Simple Improvement for
Scoring Matrix
In reality, the space complexity of the
scoring matrix is only linear, i.e.,
O(2*min(m,n)) = O(min(m,n))
O(min(m,n)) O(n) for sequences of
comparable lengths
2,000 bytes (instead of 1 million)
But, trace back still quadratic space
complexity
June 24, 2005
Space Efficient Alignment Algorithms
6
Hirschberg’s “Divide and Conquer”
Space Efficient Algorithm
Compute the score matrix(s)
between the source (0,0) and
(n, m/2). Save m/2 column of
s. Compute the reverse score
matrix (sreverse) between the
sink (n, m) and (0,m/2). Save
the m/2 column of sreverse.
Find middle (i, m/2) satisfies
max 0 in {s(i, m/2) + sreverse(ni, m/2)}
Recursively partition problem
into 2 subproblems
Source
(0,0)
i
m/2
n
(0,0)
(0,0)
m/2
m
middle
(n,m)
Sink
m/2
m
n
(0,0)
(n,m)
m
middle
middle
middle
n
(0,0)
n
June 24, 2005
m
Space Efficient Alignment Algorithms
(n,m)
m
n
(0,0)
(n,m)
n
(n,m)
m
(n,m)
7
Pseudo Code of SpaceEfficient Alignment Algorithm
Path (source, sink)
If source and sink are in consecutive columns
output the longest path from the source to the
sink
Else
middle middle vertex between source and sink
Path (source, middle)
Path (middle, sink)
June 24, 2005
Space Efficient Alignment Algorithms
8
Complexity of Space-Efficient
Alignment Algorithm
Time complexity
Equal to the sum of the areas of the rectangles
Area + ½ Area + ¼ Area + … 2*Area
where, Area = n*m
O(2n*m) = O(n*m)
Quadratic time/computation complexity (same as before)
Space complexity
Need to save a column of s and sreverse for each computation
(but can discard after computing middle)
O(min(n,m)) – if m < n, switch the sequences (or save a
row of s and sreverse instead)
Linear space complexity!!
June 24, 2005
Space Efficient Alignment Algorithms
9
Project Teams and Presentation
Assignments (Revised)
Base Project (Global Alignment):
Miguel and Joseph
Extension 0 (Global Alignment - all alignments with max score):
Mario
Extension 1 (Ends-Free Global Alignment):
Sherri and Dhonam
Extension 2 (Local Alignment):
Kung-Hua, and Swapna
Extension 3 (Affine Gap Penalty):
Cory and Nam
Extension 4 (Database):
Melinda and Dana
Extension 5 (Space Efficient Algorithm):
Michael and Sarah
June 24, 2005
Space Efficient Alignment Algorithms
10
Workshop
Work on Sequence Alignment project
Email me a progress report on Tuesday, June
28th
Specify the implementation status for each module
List each function within a module and specify it’s status
Date written
Date testing completed
Author
Include functions in the list that are not completed (I.e.,
not written yet or fully tested). For these cases, write
TBD (to be determined) in the respective date field.
Only one report per group, but cc your partner on
your e-mail!
June 24, 2005
Space Efficient Alignment Algorithms
11