- Cal State LA - Instructional Web Server

Download Report

Transcript - Cal State LA - Instructional Web Server

Space Efficient Alignment
Algorithms
Dr. Nancy Warter-Perez
June 24, 2005
Outline



Algorithm complexity
Complexity of dynamic programming
alignment algorithms
Hirschberg’s Divide and Conquer algorithm
June 24, 2005
Space Efficient Alignment Algorithms
2
Algorithm Complexity

Indicates the space and time (computational) efficiency
of a program



Generally written in Big-O notation



O represents the complexity (order)
n represents the size of the data set
Examples



Space complexity refers to how much memory is required to
execute the algorithm
Time complexity refers to how long it will take to execute
(compute) the algorithm
O(n) – “order n”, linear complexity
O(n2) – “order n squared”, quadratic complexity
Constants and lower orders ignored

O(2n) = O(n) and O(n2 + n + 1) = O(n2)
June 24, 2005
Space Efficient Alignment Algorithms
3
Complexity of Dynamic Programming
Algorithms for Global/Local Alignment

Time complexity – O(m*n)

For each cell in the score matrix, perform 3
operations



Space complexity – O(m*n)




Compute Up, Left, and Diagonal scores
O(3*m*n) = O(m*n)
Size of scoring matrix = m*n
Size of trace back matrix = m*n
O(2*m*n) = O(m*n)
Where, m and n are the lengths of the
sequences being aligned.

Since m  n, O(n2 ) – quadratic complexity!
June 24, 2005
Space Efficient Alignment Algorithms
4
Memory Requirements

For a sequence of 200-500 amino acids
or nucleotides



O(n2) = 5002 = 250,000
If store each score as a 32-bit value = 4
bytes, it requires 1,000,000 bytes to
represent the scoring matrix!
If store each trace back symbol as a
character (8-bit value), it requires 250,000
bytes to represent the trace back matrix
June 24, 2005
Space Efficient Alignment Algorithms
5
Simple Improvement for
Scoring Matrix

In reality, the space complexity of the
scoring matrix is only linear, i.e.,
O(2*min(m,n)) = O(min(m,n))



O(min(m,n))  O(n) for sequences of
comparable lengths
2,000 bytes (instead of 1 million)
But, trace back still quadratic space
complexity
June 24, 2005
Space Efficient Alignment Algorithms
6
Hirschberg’s “Divide and Conquer”
Space Efficient Algorithm



Compute the score matrix(s)
between the source (0,0) and
(n, m/2). Save m/2 column of
s. Compute the reverse score
matrix (sreverse) between the
sink (n, m) and (0,m/2). Save
the m/2 column of sreverse.
Find middle (i, m/2) satisfies
max 0 in {s(i, m/2) + sreverse(ni, m/2)}
Recursively partition problem
into 2 subproblems
Source
(0,0)
i
m/2
n
(0,0)
(0,0)
m/2
m
middle
(n,m)
Sink
m/2
m
n
(0,0)
(n,m)
m
middle
middle
middle
n
(0,0)
n
June 24, 2005
m
Space Efficient Alignment Algorithms
(n,m)
m
n
(0,0)
(n,m)
n
(n,m)
m
(n,m)
7
Pseudo Code of SpaceEfficient Alignment Algorithm
Path (source, sink)
If source and sink are in consecutive columns
output the longest path from the source to the
sink
Else
middle middle vertex between source and sink
Path (source, middle)
Path (middle, sink)
June 24, 2005
Space Efficient Alignment Algorithms
8
Complexity of Space-Efficient
Alignment Algorithm

Time complexity

Equal to the sum of the areas of the rectangles
Area + ½ Area + ¼ Area + …  2*Area
where, Area = n*m



O(2n*m) = O(n*m)
Quadratic time/computation complexity (same as before)
Space complexity



Need to save a column of s and sreverse for each computation
(but can discard after computing middle)
O(min(n,m)) – if m < n, switch the sequences (or save a
row of s and sreverse instead)
Linear space complexity!!
June 24, 2005
Space Efficient Alignment Algorithms
9
Project Teams and Presentation
Assignments (Revised)

Base Project (Global Alignment):
Miguel and Joseph

Extension 0 (Global Alignment - all alignments with max score):
Mario

Extension 1 (Ends-Free Global Alignment):
Sherri and Dhonam

Extension 2 (Local Alignment):
Kung-Hua, and Swapna

Extension 3 (Affine Gap Penalty):
Cory and Nam

Extension 4 (Database):
Melinda and Dana

Extension 5 (Space Efficient Algorithm):
Michael and Sarah
June 24, 2005
Space Efficient Alignment Algorithms
10
Workshop


Work on Sequence Alignment project
Email me a progress report on Tuesday, June
28th

Specify the implementation status for each module

List each function within a module and specify it’s status





Date written
Date testing completed
Author
Include functions in the list that are not completed (I.e.,
not written yet or fully tested). For these cases, write
TBD (to be determined) in the respective date field.
Only one report per group, but cc your partner on
your e-mail!
June 24, 2005
Space Efficient Alignment Algorithms
11