Hello, i am trying to write a program using biopython that will align some sequences from a fasta file (for the test that i will present 5 of them) against a fasta file containing a genome. For each, gene-genome alignment i want the score of the alignment. However, when i initially used pairwise2 it game an error "MemoryError: Out of memory" and suggested that i should use Bio.Align.PairwiseAligner. When i tried the Bio.Align.PairwiseAligner the code just ran but gave me no output and the terminal said it was Killed. here is the code:
from Bio import SeqIO
from Bio.Align import PairwiseAligner
genome_file = "CP000033.3[1..1993560].fasta"
genome_sequence = next(SeqIO.parse(genome_file, "fasta")).seq
gene_file = "genes.fasta"
gene_sequences = list(SeqIO.parse(gene_file, "fasta"))
aligner = PairwiseAligner()
for gene_seq in gene_sequences:
alignment = aligner.align(genome_sequence, gene_seq.seq)
print("Gene:", gene_seq.id)
print("Alignment Score:", alignment.score)
print()
Also, here is a screenshot from the terminal and the VScode:
Thank you in advance, for your kind recommendations!
Edit: Could it be that the genome consumes memory or exceed the time limit?
Well my laptop has currently 6gb of ram (I know...). The genome fasta is L. Acidophilus's NCFM genome and the fasta containing the genes has for this example 5 seqs approximately 900-1500kb. The final goal is to scan like 100 genes against the genome provided by the user. So you think if this program runs to a server will it produce the desired result? And if so, is there any chance to do it locally without the server?
if you want this kind of alignment you should use a computer with a bit more resources otherwise, you can use minimap2 to map your genes vs a reference genome, which consumes fewer resources.
Great i will talk with my lab because they will eventually use it and not me and also i will try and use the minimap2 tool. Thank you very much!