- Cal State LA - Instructional Web Server

Download Report

Transcript - Cal State LA - Instructional Web Server

Space Efficient Alignment
Algorithms and Affine Gap
Penalties
Dr. Nancy Warter-Perez
Outline





Algorithm complexity
Complexity of dynamic programming
alignment algorithms
Memory efficient algorithms
Hirschberg’s Divide and Conquer algorithm
Affine gap penalty
Space Efficient Alignment Algorithms
2
Algorithm Complexity

Indicates the space and time (computational) efficiency
of a program



Generally written in Big-O notation



O represents the complexity (order)
n represents the size of the data set
Examples



Space complexity refers to how much memory is required to
execute the algorithm
Time complexity refers to how long it will take to execute
(compute) the algorithm
O(n) – “order n”, linear complexity
O(n2) – “order n squared”, quadratic complexity
Constants and lower orders ignored

O(2n) = O(n) and O(n2 + n + 1) = O(n2)
Space Efficient Alignment Algorithms
3
Complexity of Dynamic Programming
Algorithms for Global/Local Alignment

Time complexity – O(m*n)

For each cell in the score matrix, perform 3
operations



Space complexity – O(m*n)




Compute Up, Left, and Diagonal scores
O(3*m*n) = O(m*n)
Size of scoring matrix = m*n
Size of trace back matrix = m*n
O(2*m*n) = O(m*n)
Where, m and n are the lengths of the
sequences being aligned.

Since m  n, O(n2 ) – quadratic complexity!
Space Efficient Alignment Algorithms
4
Memory Requirements

For a sequence of 200-500 amino acids
or nucleotides



O(n2) = 5002 = 250,000
If store each score as a 32-bit value = 4
bytes, it requires 1,000,000 bytes to
represent the scoring matrix!
If store each trace back symbol as a
character (8-bit value), it requires 250,000
bytes to represent the trace back matrix
Space Efficient Alignment Algorithms
5
Simple Improvement for
Scoring Matrix

In reality, the space complexity of the
scoring matrix is only linear, i.e.,
O(2*min(m,n)) = O(min(m,n))



O(min(m,n))  O(n) for sequences of
comparable lengths
2,000 bytes (instead of 1 million)
But, trace back still quadratic space
complexity
Space Efficient Alignment Algorithms
6
Hirschberg’s “Divide and Conquer”
Space Efficient Algorithm



Compute the score matrix(s)
between the source (0,0) and
(n, m/2). Save m/2 column of
s. Compute the reverse score
matrix (sreverse) between the
sink (n, m) and (0,m/2). Save
the m/2 column of sreverse.
Find middle (i, m/2) satisfies
max 0 in {s(i, m/2) + sreverse(ni, m/2)}
Recursively partition problem
into 2 subproblems
Source
(0,0)
i
m/2
m
m/2
m
middle
n
(0,0)
(0,0)
(n,m)
Sink
m/2
m
n
(0,0)
(n,m)
m
middle
middle
middle
n
(0,0)
n
Space Efficient Alignment Algorithms
(n,m)
m
n
(0,0)
(n,m)
n
(n,m)
m
(n,m)
7
Pseudo Code of SpaceEfficient Alignment Algorithm
Path (source, sink)
If source and sink are in consecutive columns
output the longest path from the source to the
sink
Else
middle middle vertex between source and sink
Path (source, middle)
Path (middle, sink)
Space Efficient Alignment Algorithms
8
Complexity of Space-Efficient
Alignment Algorithm

Time complexity

Equal to the sum of the areas of the rectangles
Area + ½ Area + ¼ Area + …  2*Area
where, Area = n*m



Space complexity




O(2n*m) = O(n*m)
Quadratic time/computation complexity (same as before)
Need to save a column of s and sreverse for each computation (but can
discard after computing middle)
O(min(n,m)) – if m < n, switch the sequences (or save a row of s
and sreverse instead)
Linear space complexity!!
Reference:
http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Hirsch/
Space Efficient Alignment Algorithms
9
Gap Penalties

Gap penalties account for the introduction of
a gap - on the evolutionary model, an
insertion or deletion mutation - in both
nucleotide and protein sequences, and
therefore the penalty values should be
proportional to the expected rate of such
mutations.
http://en.wikipedia.org/wiki/Sequence_alignment#Assessment_of_significance
Space Efficient Alignment Algorithms
10
Space Efficient Alignment Algorithms
11
Source: http://www.apl.jhu.edu/~przytyck/Lect03_2005.pdf
Space Efficient Alignment Algorithms
12
Space Efficient Alignment Algorithms
13
Space Efficient Alignment Algorithms
14
Space Efficient Alignment Algorithms
15
Space Efficient Alignment Algorithms
16
Space Efficient Alignment Algorithms
17
Space Efficient Alignment Algorithms
18
Project Verification - Use EMBOSS
Pairwise Alignment Tool
http://www.ebi.ac.uk/Tools/emboss/align/index.html
Space Efficient Alignment Algorithms
19
Project Verification – LALIGN
http://www.ch.embnet.org/software/LALIGN_form.html
Space Efficient Alignment Algorithms
20