Entering edit mode
3.9 years ago
carlos_marchi
▴
80
Hi!
I have a list of permutations of the DNA sequences where the alignment score of the sequence pairs is obtained. I don't know why this process is causing memory leak when the permutation list is big. Here example of score calculation:
for sequence1, sequence2 in sequence_permutation:
score = self.__calculate_sequence_similarity(sequence1, sequence2)
alignments[sequence1].append(sequence2)
save_aligments(alignments)
def __calculate_score_alignment(self, sequence1, sequence2):
from Bio.Align import substitution_matrices
from Bio import Align
from Bio.SubsMat import MatrixInfo
aligner = Align.PairwiseAligner()
aligner.mode = 'local'
aligner.substitution_matrix = substitution_matrices.load('BLOSUM62')
return aligner.score(sequence1, sequence2)
def __calculate_sequence_similarity(self, sequence1: str, sequence2: str) -> float:
if not sequence1 and not sequence2:
return -1
score = self.__calculate_score_alignment(sequence1, sequence2)
score1 = self.__calculate_score_alignment(sequence1, sequence1)
score2 = self.__calculate_score_alignment(sequence2, sequence2)
return score / (math.sqrt(score1) * math.sqrt(score2))
A memory leak is a software bug. If it doesn't originate in your code but in a library you're using you should report it to the library's authors. Make sure though that it is really a memory leak and not simply large memory usage caused by having a large data set. Also note that many scripting languages like python may not return all used memory to the system until after the script has exited so if your script creates a data structure using half the available RAM then most of this will stay associated with the script process even if the corresponding data structure has been destroyed.
The program memory increases in each interaction. So, It is not due to the dataset size, It may be some object that has destroyed as you wrote before. The object Aligner has created In each interaction, so I can't see an error with that code.
This is something to report as an issue on the biopython github repository if you are confident its a real problem with the library.