Entering edit mode
9.5 years ago
mbk0asis
▴
700
Hi,
I used 'pairwise2' in python to find where a oligo sequences came from.
I did 'local' alignment and got weird results.
Example seq.fa and primer.fa are below. '>primer' sequence is the first 14 bases in '>seq'
> seq
CCTCAACCTTCCAGGCTCGAGACATCCTCCCACCCCAGCCTCCCTAATAG
>primer
CCTCAACCTTCCAG
The code is
from itertools import product
from Bio import SeqIO
from Bio import pairwise2
from Bio.pairwise2 import format_alignment
seqs1 = SeqIO.to_dict(SeqIO.parse(open('./seq.fa'),'fasta'))
seqs2 = SeqIO.to_dict(SeqIO.parse(open('./primer.fa'),'fasta'))
result = open("./result.txt","w")
for sr1, sr2 in product(seqs1,seqs2):
for a in pairwise2.align.localxx(str(seqs1[sr1].seq), str(seqs2[sr2].seq)):
result.write(format_alignment(*a))
and results are
CCTCAACCTTCCAGGCTCGAGACATCCTCCCACCCCAGCCTCCCTAATAG
||||||||||||||||||||||||||||||||||||||||||||||||||
CCTC-A------A----C----C-T--T-------------C-C--A--G
Score=14
CCTCAACCTTCCAGGCTCGAGACATCCTCCCACCCCAGCCTCCCTAATAG
||||||||||||||||||||||||||||||||||||||||||||||||||
CCTCA-------A----C----C-T--T-------------C-C--A--G
Score=14
CCTCAACCTTCCAGGCTCGAGACATCCTCCCACCCCAGCCTCCCTAATAG
||||||||||||||||||||||||||||||||||||||||||||||||||
CCTCAA-----------C----C-T--T-------------C-C--A--G
Score=14
CCTCAACCTTCCAGGCTCGAGACATCCTCCCACCCCAGCCTCCCTAATAG
||||||||||||||||||||||||||||||||||||||||||||||||||
CCTC-A------A--C------C-T--T-------------C-C--A--G
Score=14
CCTCAACCTTCCAGGCTCGAGACATCCTCCCACCCCAGCCTCCCTAATAG
||||||||||||||||||||||||||||||||||||||||||||||||||
CCTCA-------A--C------C-T--T-------------C-C--A--G
Score=14
CCTCAACCTTCCAGGCTCGAGACATCCTCCCACCCCAGCCTCCCTAATAG
||||||||||||||||||||||||||||||||||||||||||||||||||
CCTCAA---------C------C-T--T-------------C-C--A--G
Score=14
Can somebody tell me what went wrong?
Thank you!
Just addressing the concept, shouldn't semi-global alignment be used to align a primer to a seq, where gaps are not penalized at the start of the primer sequence but are heavily penalized within?