Segmentation fault Biopython pairwise alignment
0
0
Entering edit mode
18 months ago

Hi everybody !

I'm working in order to create my own pairwise sequence alignment program in Python. I use the pairwise2.align command from Bipython. When I use it with small sequences it works. I put the code bellow (2 for a match, -2 for a mismatch, -3 for an open gap and -1 for an extend gap):

from Bio import pairwise2
target_seq = "ATGCNTGA"
query_seq = "ATTGGCCATTN"
alignments = pairwise2.align.globalms(target_seq, query_seq, 2, -2, -3, -1)

However when I used two huge sequence (HHV8 consensus sequence from an illumina sequencing), I got this error:

segmentation fault (core dumped)

I used the same code.

The size of sequences are:

cat ../Results/1G_S15/1G_S15.fasta | grep -v ">" | wc -c
140280
cat ../Results/8G_S12/1G_S12.fasta | grep -v ">" | wc -c
140272

Do you think that the huge sequence size can be the origin of this error ? If is the case, do you have a trick to avoid it ?

Best regards,

Antoine

biopython alignment • 1.1k views
ADD COMMENT
3
Entering edit mode

I don't think the pairwise aligner, as implemented in biopython could possibly align 140K long sequences. From your example we can't tell if a single sequence is 140K or all together. But since you are talking about "huge" sequences I assumed the former situation.

It was not designed for sequences of that size. You would need to use a different tool in my opinion.

if you have multiple sequences then you need to show the code you use, because from your example one cannot tell how you are using it.

ADD REPLY
1
Entering edit mode

Segfault usually means too much data yep. As Istvan said, this is not really what Pairwise is for.

Moreover, alignment of very long sequences is still a tricky task. Its made a bit easier when it is just a pairwise alignment and for that I'd suggest mummer.

If you need to do multiple alignment, you'll struggle, but LASTZ is at least capable of it in my experience.

ADD REPLY
0
Entering edit mode

Is a multiline fasta? Maybe useful

https://github.com/biopython/biopython/issues/3387

ADD REPLY
0
Entering edit mode

Indeed, my files are multiline fasta. I tried to read them with readlines method and then remove the "\n". However I got the same error. I think that Istvan Albert has true. My sequences are too large to be use by pairwise2.align.

ADD REPLY

Login before adding your answer.

Traffic: 2930 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6