Python 3.4, Biopython 1.65: Writing to file both subject and query IDs from parsed BLAST output
1
2
Entering edit mode
9.6 years ago

I've performed a nucleotide BLAST of a multi-fasta file against itself in order to find sequences which share highly similar flanking sequence, indicative of a duplication event. Given the .xml file produced from this what I need to do is determine cases where significant BLAST hits come not from the same sequence (query and subject are the exact same) but from different sequences. Essentially what I'm trying to do is parse the .xml output and get it to print details about the HSPs as well as IDs for the subject and query in each of those HSPs. I tried playing around with f.write(header.query_id) and f.write(parameters.query_id) but those just spit back an error. I suspect those need to be included in a separate for statement but right now I'm very lost.

I'm going off code based on section 7.4 of the Biopython 1.65 manual:

from Bio.Blast import NCBIXML
import sys

#>python screener.py inputfile.xml outputfile.txt
input_file=str(sys.argv[1])
output_file=str(sys.argv[2])
result_handle=open(input_file)
blast_records=NCBIXML.parse(result_handle)
#screens an XML file based on the specification below and prints those results to a file
with open(output_file, "w") as f:
    for blast_record in blast_records:
        for alignment in blast_record.alignments:
            for hsp in alignment.hsps:
                if hsp.expect< 0.01: 
                    f.write('****Alignment****' + "\n")
                    f.write('sequence:'+ str(alignment.title) +"\n")
                    f.write('length:'+ str(alignment.length) +"\n")
                    f.write('e value:'+ str(hsp.expect) +"\n")
                    f.write(hsp.query[0:75] + '...'+"\n")
                    f.write(hsp.match[0:75] + '...'+"\n")
                    f.write(hsp.sbjct[0:75] + '...'+"\n")

Any suggestions would be, as always, greatly appreciated.

BLAST Biopython Python • 2.9k views
ADD COMMENT
0
Entering edit mode

Have you tried blast_record.query_id?

ADD REPLY
0
Entering edit mode

Just gave that a shot. The output it gives is in the form:

Query_2 or Query_593, etc.

So not very helpful

ADD REPLY
1
Entering edit mode
9.6 years ago

Ok, played around some more and solved the problem. What was needed was:

f.write(str(blast_record.query))

So the full code looks like this:

from Bio.Blast import NCBIXML
import sys

input_file=str(sys.argv[1])
output_file=str(sys.argv[2])
result_handle=open(input_file)
blast_records=NCBIXML.parse(result_handle)
#screens an XML file based on the specification below and prints those results to a file
with open(output_file, "w") as f:
    for blast_record in blast_records:
        for alignment in blast_record.alignments:
            for hsp in alignment.hsps:
                if hsp.expect< 0.01:
                    f.write('****Alignment****' + "\n")
                    f.write('sequence:'+ str(alignment.title) +"\n")
                    f.write('ID:' + str(blast_record.query_id) + "\n") #also added
                    f.write("Query Sequence: " + str(blast_record.query) + "\n")
                    f.write('length:'+ str(alignment.length) +"\n")
                    f.write('e value:'+ str(hsp.expect) +"\n")
                    f.write(hsp.query[0:75] + '...'+"\n")
                    f.write(hsp.match[0:75] + '...'+"\n")
                    f.write(hsp.sbjct[0:75] + '...'+"\n")
ADD COMMENT

Login before adding your answer.

Traffic: 2542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6