Hello there,
I am performing pairwise global alignment using Emboss Needleman-Wunsch algorithm via python script. The script runs pretty well with shorter sequences but it throws an error when I perform with a pair of proteins (the longest protein Titin). I am trying to perform pairwise global alignment of ensembl protein ENSP00000343764 and SwissProt protein Q8WZ42. The length of these two sequences are not same, so I am interested to see the alignment. I am using python to perform this alignment. The code I used is:
from Bio.Emboss.Applications import NeedleCommandline
from Bio import AlignIO
needle_cline=NeedleCommandline(asequence="Q8WZ42.fa",bsequence="ENSP00000343764.fa",outfile="ENSP00000343764.needle",gapopen=10,gapextend=0.5)
stdout,stderr=needle_cline()
This generates an error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/pymodules/python2.7/Bio/Application/__init__.py", line 437, in __call__
stdout_str, stderr_str)
Bio.Application.ApplicationError: Command 'needle -outfile=ENSP00000343764.needle -asequence=Q8WZ42.fa -bsequence=ENSP00000343764.fa -gapopen=10 -gapextend=0.5' returned non-zero exit status 1, 'Needleman-Wunsch global alignment of two sequences'
If I use only a small fragment (say 5000 amino acid) of any one of the sequences, the script works. It generates an alignment file. I am not sure, if the error is because of the length of the proteins. Can anyone explain the possible reason for this error and how to fix it? I might use fragments of the sequences to see the alignment but thats not a good idea when my script is running for large number of proteins. Do you have any idea how I can do it?
Thanks in advance!
Thanks @brentp This module seems to work faster. But, it did not solve my problem. Both the strings are of length approximately 35000. So I got message: MemoryError. Probably, I should make smaller fragments of one sequence and then form alignment with the other sequence.
you can either split them or go to a machine with more memory. you sure you want to do global sequence alignment on 35kb regions?
Well, I am doing that for large number of sequences. And, I want to make the process automatic.