I am implementing my own version of the Smith-Waterman
algorithm.
This works perfectly fine with 2 genomes of 100 chars each. However, I want to run this pairwise sequence alignment
algorithm on the full-length genomes (10,000 * 10,000). I get a MemoryError
.
Code:
m, n = len(seqA), len(seqB) # let m, n denote the lengths of two sequences
pointer_matrix = tasks.empty_matrix(m + 1, n + 1) # stores traceback path (same dimensions)
score_matrix = tasks.empty_matrix(m+1, n+1) # '+1' - far-left col & top row represent indexes of sequences' chars
max_score = 0 # Initial maximum score obtainable
# Contents & Declare Pointers
for i in range(1, m+1):
for j in range(1, n+1):
above_score = score_matrix[i][j-1] + gap_points # ERROR
left_score = score_matrix[i-1][j] + gap_points
above_left_score = score_matrix[i-1][j-1] + tasks.match(seqA[i-1], seqB[j-1], points_scheme)
score_matrix[i][j] = max(above_score, left_score, above_left_score, 0)
Memory Error:
Traceback (most recent call last):
File "...\main.py", line 72, in <module>
SmithWaterman.SmithWaterman("DNA", ["genome1", "genome2"], **points)
File "...\SmithWaterman.py", line 26, in SmithWaterman
above_score = score_matrix[i][j-1] + gap_points
MemoryError
Please stop deleting posts after they have gotten responses. It is poor etiquette.