I am trying to get sequence alignments with biopython but I am not getting what I think should be the correct result.
I know there are countless ways to compute alignments, can someone suggest me any tool (using biopyhon or other python libs would be preferred) that could give me the expected result?
Here an example:
from Bio import pairwise2
from Bio.pairwise2 import format_alignment
r = 'ATGGAGAAAAAAATCACTGGATATACCACCGTTGATATATCCCAATGGCATCGTAAAGAACATTT'
c = 'ATGGAGAAATAAATCACTGGATATACCACCGTTGATAAAAATATCGCAATGGCATCGTAAAGAACATTT'
alignment = pairwise2.align.globalxx(r, c)
print(format_alignment(*alignment[0]))
ATGGAGAAAA-AAATCACTGGATATACCACCGTTGATA----TATCC-CAATGGCATCGTAAAGAACATTT
|||||| ||| ||||||||||||||||||||||||||| ||| | |||||||||||||||||||||||
ATGGAG-AAATAAATCACTGGATATACCACCGTTGATAAAAATAT-CGCAATGGCATCGTAAAGAACATTT
Score=63
and here what I would like the result to be:
ATGGAGAAAAAAATCACTGGATATACCACCGTTGATA----TATCCCAATGGCATCGTAAAGAACATTT
|||||||||*||||||||||||||||||||||||||| ||||*|||||||||||||||||||||||
ATGGAGAAATAAATCACTGGATATACCACCGTTGATAAAAATATCGCAATGGCATCGTAAAGAACATTT
Score=??
What version of BioPython are you using? Pairwise2 was rewritten in later versions and now provides much more realistic results without as many spurious gaps.
I am using Biopython version 1.72, it should be the last one available in conda.
In that case, Bastien's answer is probably your best bet.