Blast-XML and Biopython: extract query-anchored results (i.e. MSA)
1
0
Entering edit mode
7.1 years ago
LeWöps • 0

Dear all,

I have some trouble getting blast (command-line) and biopython work together.

I want a pipeline that runs a single protein query against a blast database, and returns a multiple sequence alignment of hits into python, where I can do further analysis on it.

In tblastn, I can choose the output - my desired format would be "3 = flat query-anchored, show identities", which returns a multiple sequence alignment. Unfortunately, this is, as far as I understand, not really compatible with biopython.

Biopython instead accepts e.g. a blast-xml file as input, which can be selected as an output from tblastn. However, from an xml, in biopython I am only able to extract pairwise alignments - the MSA seems to be lost. The Bio.Blast.record class does even have a multiple_alignment method, but it returns 'None' all the time.

I hope my problem is understandable - does anyone have experience in how to get the 'query-anchored' output from blast into (bio)python?

multiple-sequence-alignment biopython blast xml • 2.3k views
ADD COMMENT
0
Entering edit mode
7.1 years ago
Peter 6.0k

There is indeed a .multiple_alignment attribute in the BLAST record object, and there is code in Bio.Blast.NCBIStandalone (the plain text BLAST parser) which should populate it. There is an example of sorts in one test within Tests/test_NCBITextParser.py which might be useful. As a self contained example based on that, try something like this:

from Bio.Blast.NCBIStandalone import BlastParser
from Bio.Alphabet import IUPAC

parser = BlastParser()
with open("Blast/text_2010L_blastp_006.txt") as handle:
    record = parser.parse(handle)

generic_align = record.multiple_alignment.to_generic(IUPAC.protein)
test_seq = generic_align[0].seq
assert test_seq.alphabet == IUPAC.protein
assert str(test_seq[:60]) == record.multiple_alignment.alignment[0][2]

It might be nice to add this to Bio.AlignIO but I personally have never used the BLAST text MSA output however, and I don't think it works within Bio.SearchIO.

ADD COMMENT

Login before adding your answer.

Traffic: 2098 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6