I am trying to align the following sequence using a python code. I am able to obtain the best score results but not the alignments with lower scores. Is there an efficient way to do this?
# Import pairwise2 module
from Bio import pairwise2
# Import format_alignment method
from Bio.pairwise2 import format_alignment
# Define two sequences to be aligned
X = "ACAGCCAGCGGGGGAAATAGAGTATTTTTTTCGAGGAGGATAGGTTACTCCTCTGGTCAAAGTTGGATGCGTAATATCACACGATGCGTTGACCGCGGAGAGTGACATTATGTGGATCGGAATATCACTATTCTGCTCGAACTTCCATAG"
Y = "ACAGCCAGCGGGGGAAATAGAGTATTTTTTTCGAGGAGGATAGGTTACTCCTCTGGTCAAAGTTGGATGCGTAATATCACACGATGCGTTGACCGCGGAGAGTGACATTATGTGAATATCACTATTCTGCTCGAACTTCCATAGAGATCG"
# Get a list of the global alignments between the two sequences ACGGGT and ACG satisfying the given scoring
# A match score is the score of identical chars, else mismatch score.
# Same open and extend gap penalties for both sequences.
alignments = pairwise2.align.globalms(X, Y, 1, -1, -1, -0.1, force_generic=True, score_only = True)
# Use format_alignment method to format the alignments in the list
for a in alignments:
print(format_alignment(*a))
Here is the result I get with the above code.
ACAGCCAGCGGGGGAAATAGAGTATTTTTTTCGAGGAGGATAGGTTACTCCTCTGGTCAAAGTTGGATGCGTAATATCACACGATGCGTTGACCGCGGAGAGTGACATTATGTGGATCGGAATATCACTATTCTGCTCGAACTTCCATA------G
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||| |
ACAGCCAGCGGGGGAAATAGAGTATTTTTTTCGAGGAGGATAGGTTACTCCTCTGGTCAAAGTTGGATGCGTAATATCACACGATGCGTTGACCGCGGAGAGTGACATTATGT------GAATATCACTATTCTGCTCGAACTTCCATAGAGATCG
Score=141
What I want is alignments with lower scores such as 130, 100, etc. I know there could be many such examples but I need the alignments with the lower scores. I tried looking at the source code for Biopython but it seems to only give the best score.
As far as I'm aware, the
alignments
object should be a list of all alignments scoring above the specified thresholds. If you're only getting one, that would suggest there's only one alignment that meets the criteria.You could look at the new
PairwiseAligner()
module/method though which is intended to replacepairwise2
. This might give you the output you want.Biopython PairwiseAligner sorting on scores
Align.PairwiseAligner() implements the Needleman-Wunsch, Smith-Waterman, Gotoh, and Waterman-Smith-Beyer global and local pairwise alignment algorithms to find the best-scoring between two sequences. The PairwiseAligner object automatically chooses the appropriate alignment algorithm. You can try a different penalization score to get better results and different scores.
This is what was written on that post. What I want is different scores for the same match, mismatch and gap penalties.
PairwiseAligner()
implements all the same functionality aspairwise2
as far as I'm aware, its essentially a wrapper. You should still be able to get multiple scoring alignments back.But as I said, the
alignments
object in your code above should already have multiple scoring alignments if there are more than one that meets the criteria. I suspect the issue is just that your test data doesn't produce multiple alignments.