Pairwise2 - Biopython - weird local alignment results
1
3
Entering edit mode
9.5 years ago
mbk0asis ▴ 700

Hi,

I used 'pairwise2' in python to find where a oligo sequences came from.

I did 'local' alignment and got weird results.

Example seq.fa and primer.fa are below. '>primer' sequence is the first 14 bases in '>seq'

> seq
CCTCAACCTTCCAGGCTCGAGACATCCTCCCACCCCAGCCTCCCTAATAG
>primer
CCTCAACCTTCCAG

The code is

from itertools import product
from Bio import SeqIO
from Bio import pairwise2
from Bio.pairwise2 import format_alignment

seqs1 = SeqIO.to_dict(SeqIO.parse(open('./seq.fa'),'fasta'))
seqs2 = SeqIO.to_dict(SeqIO.parse(open('./primer.fa'),'fasta'))

result = open("./result.txt","w")
for sr1, sr2 in product(seqs1,seqs2):
    for a in pairwise2.align.localxx(str(seqs1[sr1].seq), str(seqs2[sr2].seq)):
           result.write(format_alignment(*a))

and results are

CCTCAACCTTCCAGGCTCGAGACATCCTCCCACCCCAGCCTCCCTAATAG
||||||||||||||||||||||||||||||||||||||||||||||||||
CCTC-A------A----C----C-T--T-------------C-C--A--G
  Score=14
CCTCAACCTTCCAGGCTCGAGACATCCTCCCACCCCAGCCTCCCTAATAG
||||||||||||||||||||||||||||||||||||||||||||||||||
CCTCA-------A----C----C-T--T-------------C-C--A--G
  Score=14
CCTCAACCTTCCAGGCTCGAGACATCCTCCCACCCCAGCCTCCCTAATAG
||||||||||||||||||||||||||||||||||||||||||||||||||
CCTCAA-----------C----C-T--T-------------C-C--A--G
  Score=14
CCTCAACCTTCCAGGCTCGAGACATCCTCCCACCCCAGCCTCCCTAATAG
||||||||||||||||||||||||||||||||||||||||||||||||||
CCTC-A------A--C------C-T--T-------------C-C--A--G
  Score=14
CCTCAACCTTCCAGGCTCGAGACATCCTCCCACCCCAGCCTCCCTAATAG
||||||||||||||||||||||||||||||||||||||||||||||||||
CCTCA-------A--C------C-T--T-------------C-C--A--G
  Score=14
CCTCAACCTTCCAGGCTCGAGACATCCTCCCACCCCAGCCTCCCTAATAG
||||||||||||||||||||||||||||||||||||||||||||||||||
CCTCAA---------C------C-T--T-------------C-C--A--G
  Score=14

Can somebody tell me what went wrong?

Thank you!

pairwise2 local-alignment biopython • 7.0k views
ADD COMMENT
1
Entering edit mode

Just addressing the concept, shouldn't semi-global alignment be used to align a primer to a seq, where gaps are not penalized at the start of the primer sequence but are heavily penalized within?

ADD REPLY
8
Entering edit mode
9.5 years ago
Felix_Sim ▴ 260

Try the following:

for sr1, sr2 in product(seqs1,seqs2):
    for a in pairwise2.align.localms(str(seqs1[sr1].seq), str(seqs2[sr2].seq), 2, -1, -.5, -.1):
        result.write(format_alignment(*a))

I have added the penalty system to your alignment. This is taken from the Bio.pairwise2.

# Same as above, except now 0.5 points are deducted when opening a
# gap, and 0.1 points are deducted when extending it.

When you include penalties, your alignment should be local and work out. At least in the example you have provided above.

# Your input sequences.
CCTCAACCTTCCAGGCTCGAGACATCCTCCCACCCCAGCCTCCCTAATAG
||||||||||||||
CCTCAACCTTCCAG------------------------------------
  Score=28 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
# Added Cs at the beginning to see whether local alignment still works
CCCCCCCCCCCCCCCCCCCCCTCAACCTTCCAGGCTCGAGACATCCTCCCACCCCAGCCTCCCTAATAG
                   ||||||||||||||
-------------------CCTCAACCTTCCAG------------------------------------
  Score=28
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
# Including an almost identical primer still aligns correctly

CCTCAACCTTCCAAGCTCCCTCAACCTTCCAGGCTCGAGACATCCTCCCACCCCAGCCTCCCTAATAG
                  ||||||||||||||
------------------CCTCAACCTTCCAG------------------------------------
  Score=28

Hope this solves your problem.

ADD COMMENT
0
Entering edit mode

Thank you, Felix! It works perfectly.

ADD REPLY

Login before adding your answer.

Traffic: 1204 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6