Extracting gapped sequences from Bio.Align.PairwiseAligner
1
1
Entering edit mode
22 months ago
biomarco ▴ 50

Hi everyone, I noticed that the Bio.pairwise2 module has been deprecated, so I wanted to switch to Bio.Align.PairwiseAligner in my scripts, as suggested by the warning message.

I used to do the following to calculate sequence identity and query coverage:

aln1, aln2, _, start_aln, end_aln = pairwise2.align.localds(x_seq, y_seq, blosum_matrix, -11, -1)[0]
identities = len([i for i,k in zip(aln1[start_aln:end_aln], aln2[start_aln:end_aln]) if i == k and i != '-'])
perc_ident = round(100 * (identities / (end_aln - start_aln)), 1)
query_cov = round(100 * ((end_aln - start_aln) / len(aln1)), 1)

With PairwiseAligner it looks like you can just print the aligned sequences, and I could not find a way to access the query and target gapped sequences (aln1 and aln2 variables above), so there is no way for me to calculate the sequence identity.

Can somebody help?

pairwise2 PairwiseAligner BioPython • 2.7k views
ADD COMMENT
0
Entering edit mode

I think this point of view is different among generations: I never update a program or an operating system unless there is a very good reason to do so. To be clear, in my book having a newer version is not a good reason to update.

So while you are waiting for someone to answer your question, you can always go back to the BioPython version that worked for you.

ADD REPLY
0
Entering edit mode

I get your point, but if I can get rid of warning messages without silencing or ignoring them, I'm very happy to do so, especially if a module I'm using has been deprecated (I assume they had a good reason to do so). Anyway the current BioPython version is still working fine, I just get a warning message when I import pairwise2 suggesting to use something else and to let the developers know if you really still need that module. That's why I'm wondering whether it is possible to do the same I used to do with PairwiseAligner instead of pairwise2.

ADD REPLY
1
Entering edit mode

Do not delete posts that have received feedback. Redacting content is extremely bad etiquette and if done again, will get your user account suspended.

I've restored the content from Google cache.

ADD REPLY
0
Entering edit mode

FWIW, there are good reasons beyond it simply being a new version to update here. Pairwise in particular has been overhauled quite a bit in more recent versions of Biopython.

ADD REPLY
3
Entering edit mode
22 months ago
biomarco ▴ 50

Sorry for deleting my question, I was not sure anymore it was a question worthy to be asked. Won't do that again! I will also post the solution I found, for anyone stumbling across the same problem:

from Bio.Align import substitution_matrices, PairwiseAligner

aligner_local = PairwiseAligner(
    mode = 'local',
    open_gap_score = -11,
    extend_gap_score = -1,
    substitution_matrix = substitution_matrices.load("BLOSUM62")
    )

def get_pairwise_identity(query: str, subject: str, aligner: PairwiseAligner):
    # NOTE using local aligner is recommended!
    alignment = aligner.align(query, subject)[0]  # looks just at the first alignment
    aln1, aln2 = alignment  ###### <--- THIS LINE ANSWERS MY INITIAL QUESTION! ######
    identities = len([i for i, k in zip(aln1, aln2) if i == k and i != '-'])
    identity = 100 * identities / len(aln1)
    query_coverage = 100 * (alignment.coordinates[0][-1] - alignment.coordinates[0][0]) / len(query)
    subject_coverage = 100 * (alignment.coordinates[1][-1] - alignment.coordinates[1][0]) / len(subject)
    return round(identity, 2), round(query_coverage, 2), round(subject_coverage, 2)

identity, q_cov, s_cov = get_pairwise_identity(seq1, seq2, aligner_local)
ADD COMMENT
1
Entering edit mode

Thank you for posting the solution - that is the right way to go!

ADD REPLY
1
Entering edit mode

You can accept your own answer to resolve the post.

ADD REPLY

Login before adding your answer.

Traffic: 2901 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6