Question

Sequence alignment with biopython

0

Entering edit mode

5.7 years ago

alec_djinn ▴ 390

I am trying to get sequence alignments with biopython but I am not getting what I think should be the correct result.

I know there are countless ways to compute alignments, can someone suggest me any tool (using biopyhon or other python libs would be preferred) that could give me the expected result?

Here an example:

from Bio import pairwise2
from Bio.pairwise2 import format_alignment

r = 'ATGGAGAAAAAAATCACTGGATATACCACCGTTGATATATCCCAATGGCATCGTAAAGAACATTT'
c = 'ATGGAGAAATAAATCACTGGATATACCACCGTTGATAAAAATATCGCAATGGCATCGTAAAGAACATTT'
alignment = pairwise2.align.globalxx(r, c)
print(format_alignment(*alignment[0]))

ATGGAGAAAA-AAATCACTGGATATACCACCGTTGATA----TATCC-CAATGGCATCGTAAAGAACATTT
|||||| ||| |||||||||||||||||||||||||||    ||| | |||||||||||||||||||||||
ATGGAG-AAATAAATCACTGGATATACCACCGTTGATAAAAATAT-CGCAATGGCATCGTAAAGAACATTT
  Score=63

and here what I would like the result to be:

ATGGAGAAAAAAATCACTGGATATACCACCGTTGATA----TATCCCAATGGCATCGTAAAGAACATTT
|||||||||*|||||||||||||||||||||||||||    ||||*|||||||||||||||||||||||
ATGGAGAAATAAATCACTGGATATACCACCGTTGATAAAAATATCGCAATGGCATCGTAAAGAACATTT
  Score=??

sequence alignment • 3.0k views

ADD COMMENT • link updated 5.7 years ago by Bastien Hervé 6.0k • written 5.7 years ago by alec_djinn ▴ 390

0

Entering edit mode

What version of BioPython are you using? Pairwise2 was rewritten in later versions and now provides much more realistic results without as many spurious gaps.

ADD REPLY • link 5.7 years ago by Joe 21k

0

Entering edit mode

I am using Biopython version 1.72, it should be the last one available in conda.

ADD REPLY • link 5.7 years ago by alec_djinn ▴ 390

0

Entering edit mode

In that case, Bastien's answer is probably your best bet.

ADD REPLY • link 5.7 years ago by Joe 21k

score 2 · Accepted Answer · 2019-04-19

You can increase the gap penalty values, open and extend ones

alignment = pairwise2.align.globalms(r, c, 2,-1,-1,-0.5)
print(format_alignment(*alignment[0]))

ATGGAGAAAAAAATCACTGGATATACCACCGTTGAT----ATATCCCAATGGCATCGTAAAGAACATTT
|||||||||.||||||||||||||||||||||||||    |||||.|||||||||||||||||||||||
ATGGAGAAATAAATCACTGGATATACCACCGTTGATAAAAATATCGCAATGGCATCGTAAAGAACATTT
   Score=121.5

Identical characters are given 2 points, 1 point is deducted for each non-identical character, 1 point is deducted when opening a gap, and 0.5 points are deducted when extending it.

See the docs