Question

Python 3.4, Biopython 1.65: Writing to file both subject and query IDs from parsed BLAST output

2

Entering edit mode

9.6 years ago

tyleraelliott ▴ 50

I've performed a nucleotide BLAST of a multi-fasta file against itself in order to find sequences which share highly similar flanking sequence, indicative of a duplication event. Given the .xml file produced from this what I need to do is determine cases where significant BLAST hits come not from the same sequence (query and subject are the exact same) but from different sequences. Essentially what I'm trying to do is parse the .xml output and get it to print details about the HSPs as well as IDs for the subject and query in each of those HSPs. I tried playing around with f.write(header.query_id) and f.write(parameters.query_id) but those just spit back an error. I suspect those need to be included in a separate for statement but right now I'm very lost.

I'm going off code based on section 7.4 of the Biopython 1.65 manual:

from Bio.Blast import NCBIXML
import sys

#>python screener.py inputfile.xml outputfile.txt
input_file=str(sys.argv[1])
output_file=str(sys.argv[2])
result_handle=open(input_file)
blast_records=NCBIXML.parse(result_handle)
#screens an XML file based on the specification below and prints those results to a file
with open(output_file, "w") as f:
    for blast_record in blast_records:
        for alignment in blast_record.alignments:
            for hsp in alignment.hsps:
                if hsp.expect< 0.01: 
                    f.write('****Alignment****' + "\n")
                    f.write('sequence:'+ str(alignment.title) +"\n")
                    f.write('length:'+ str(alignment.length) +"\n")
                    f.write('e value:'+ str(hsp.expect) +"\n")
                    f.write(hsp.query[0:75] + '...'+"\n")
                    f.write(hsp.match[0:75] + '...'+"\n")
                    f.write(hsp.sbjct[0:75] + '...'+"\n")

Any suggestions would be, as always, greatly appreciated.

BLAST Biopython Python • 2.9k views

ADD COMMENT • link updated 2.4 years ago by Ram 44k • written 9.6 years ago by tyleraelliott ▴ 50

0

Entering edit mode

Have you tried blast_record.query_id?

ADD REPLY • link 9.6 years ago by Ram 44k

0

Entering edit mode

Just gave that a shot. The output it gives is in the form:

Query_2 or Query_593, etc.

So not very helpful

ADD REPLY • link updated 2.4 years ago by Ram 44k • written 9.6 years ago by tyleraelliott ▴ 50

Ram · Accepted Answer · 2015-04-24

Ok, played around some more and solved the problem. What was needed was:

f.write(str(blast_record.query))

So the full code looks like this:

from Bio.Blast import NCBIXML
import sys

input_file=str(sys.argv[1])
output_file=str(sys.argv[2])
result_handle=open(input_file)
blast_records=NCBIXML.parse(result_handle)
#screens an XML file based on the specification below and prints those results to a file
with open(output_file, "w") as f:
    for blast_record in blast_records:
        for alignment in blast_record.alignments:
            for hsp in alignment.hsps:
                if hsp.expect< 0.01:
                    f.write('****Alignment****' + "\n")
                    f.write('sequence:'+ str(alignment.title) +"\n")
                    f.write('ID:' + str(blast_record.query_id) + "\n") #also added
                    f.write("Query Sequence: " + str(blast_record.query) + "\n")
                    f.write('length:'+ str(alignment.length) +"\n")
                    f.write('e value:'+ str(hsp.expect) +"\n")
                    f.write(hsp.query[0:75] + '...'+"\n")
                    f.write(hsp.match[0:75] + '...'+"\n")
                    f.write(hsp.sbjct[0:75] + '...'+"\n")