Getting alignments with lower scores on Biopython
2
1
Entering edit mode
3.9 years ago

I am trying to align the following sequence using a python code. I am able to obtain the best score results but not the alignments with lower scores. Is there an efficient way to do this?

# Import pairwise2 module
from Bio import pairwise2

# Import format_alignment method
from Bio.pairwise2 import format_alignment

# Define two sequences to be aligned
X = "ACAGCCAGCGGGGGAAATAGAGTATTTTTTTCGAGGAGGATAGGTTACTCCTCTGGTCAAAGTTGGATGCGTAATATCACACGATGCGTTGACCGCGGAGAGTGACATTATGTGGATCGGAATATCACTATTCTGCTCGAACTTCCATAG"
Y = "ACAGCCAGCGGGGGAAATAGAGTATTTTTTTCGAGGAGGATAGGTTACTCCTCTGGTCAAAGTTGGATGCGTAATATCACACGATGCGTTGACCGCGGAGAGTGACATTATGTGAATATCACTATTCTGCTCGAACTTCCATAGAGATCG"

# Get a list of the global alignments between the two sequences ACGGGT and ACG satisfying the given scoring
# A match score is the score of identical chars, else mismatch score.
# Same open and extend gap penalties for both sequences.
alignments = pairwise2.align.globalms(X, Y, 1, -1, -1, -0.1, force_generic=True, score_only = True)

# Use format_alignment method to format the alignments in the list
for a in alignments:
    print(format_alignment(*a))

Here is the result I get with the above code.

ACAGCCAGCGGGGGAAATAGAGTATTTTTTTCGAGGAGGATAGGTTACTCCTCTGGTCAAAGTTGGATGCGTAATATCACACGATGCGTTGACCGCGGAGAGTGACATTATGTGGATCGGAATATCACTATTCTGCTCGAACTTCCATA------G
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||      ||||||||||||||||||||||||||||||      |
ACAGCCAGCGGGGGAAATAGAGTATTTTTTTCGAGGAGGATAGGTTACTCCTCTGGTCAAAGTTGGATGCGTAATATCACACGATGCGTTGACCGCGGAGAGTGACATTATGT------GAATATCACTATTCTGCTCGAACTTCCATAGAGATCG
  Score=141

What I want is alignments with lower scores such as 130, 100, etc. I know there could be many such examples but I need the alignments with the lower scores. I tried looking at the source code for Biopython but it seems to only give the best score.

genome python Biopython sequence • 2.8k views
ADD COMMENT
1
Entering edit mode

As far as I'm aware, the alignments object should be a list of all alignments scoring above the specified thresholds. If you're only getting one, that would suggest there's only one alignment that meets the criteria.

You could look at the new PairwiseAligner() module/method though which is intended to replace pairwise2. This might give you the output you want.

Biopython PairwiseAligner sorting on scores

ADD REPLY
0
Entering edit mode

Align.PairwiseAligner() implements the Needleman-Wunsch, Smith-Waterman, Gotoh, and Waterman-Smith-Beyer global and local pairwise alignment algorithms to find the best-scoring between two sequences. The PairwiseAligner object automatically chooses the appropriate alignment algorithm. You can try a different penalization score to get better results and different scores.

This is what was written on that post. What I want is different scores for the same match, mismatch and gap penalties.

ADD REPLY
0
Entering edit mode

PairwiseAligner() implements all the same functionality as pairwise2 as far as I'm aware, its essentially a wrapper. You should still be able to get multiple scoring alignments back.

But as I said, the alignments object in your code above should already have multiple scoring alignments if there are more than one that meets the criteria. I suspect the issue is just that your test data doesn't produce multiple alignments.

ADD REPLY
2
Entering edit mode
3.9 years ago
Mensur Dlakic ★ 28k

At the end of this paper there is a short program for global alignment by dynamic programming. If you know how to modify C code, it may help in your experiments.

What I want is different scores for the same match, mismatch and gap penalties.

There is only one global alignment for the same match, mismatch and gap penalties. If you want to get multiple alignments from identical parameters, you will have to go with local alignments - and even in that case there may be only one. These algorithms are meant to go through all possible alignments, but the shortest backward path requirement means that they will output only one alignment. Look through the code I referenced above, and you may be able to modify it so it outputs a fixed number of alignments according to decreasing scores.

ADD COMMENT
0
Entering edit mode

So essentially I have to find all the paths from the right bottom corner to the top left corner? (Without moving to the right in any particular position)

ADD REPLY
0
Entering edit mode

Yes. Beware that most of them will be completely meaningless. For example, you can create an alignment where you follow the optimal path, and at any point instead of following the diagonal go up once followed by left once. That will make an insertion position in the query, followed by an insertion in the subject sequence. After that you get back to the optimal alignment which will give you an artificially smaller score than the optimal path, but it should be obvious that is a bogus alignment.

ADD REPLY
0
Entering edit mode

There is only one global alignment for the same match, mismatch and gap penalties

I don't think this is correct.

There might only be one score, but there can be a number of different alignments. Your clarification below would be more correct.

ADD REPLY
2
Entering edit mode
3.9 years ago
Mensur Dlakic ★ 28k

In case it wasn't clear what I meant above, here is how making artificial gaps would look. Real alignment first:

ACAGCCAGCGGGGGAAATAGA
|||||||||||||||||||||
ACAGCCAGCGGGGGAAATAGA

Now the alignment with one artificial move in up and left directions during the backtracking:

ACAG-CCAGCGGGGGAAATAGA
|||  |||||||||||||||||
ACA-GCCAGCGGGGGAAATAGA

This alignment will have a worse score, but it should also be clear that it is meaningless. Depending on the length of the two sequences, one can create literally millions of these substandard and meaningless alignments, especially if the gap is allowed to be longer.

This is different from legitimately different alignments of this kind, which all have the same score:

AGGGCT
| ||||
A-GGCA

AGGGCT
|| |||
AG-GCA

AGGGCT
||| ||
AGG-CA
ADD COMMENT
0
Entering edit mode

So do you mean that all(or most) alignments which have a lower score are meaning-less(say we fix the length of the sequence)? Lets say even if its less by 1 point?

I understand what you are saying but I am not 100% convinced.

ADD REPLY
0
Entering edit mode

For global alignments of highly similar sequences, most sub-optimal alignments will likely be meaningless. I think the same will be true even if you try local alignments. I think that sub-optimal alignments will become potentially interesting only for sequences that are not highly similar.

ADD REPLY
0
Entering edit mode

Is there a way to find alignments of lower scores apart from the method I mentioned above which also gives the probability of each assignment?

Or is the modified Needleman - Wunsch the only way?

ADD REPLY

Login before adding your answer.

Traffic: 2250 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6