Question

Python 3.4, Biopython 1.65:Screening out HSPs in .xml BLAST output where the subject and query are the same

0

Entering edit mode

10.3 years ago

tyleraelliott ▴ 50

I have BLAST output in .xml format that I am parsing for certain features and writing them to a file. Here is the code:

from Bio.Blast import NCBIXML

import sys

input_file=str(sys.argv[1])
output_file=str(sys.argv[2])
e_value_thresh=float(sys.argv[3])
result_handle=open(input_file)
blast_records=NCBIXML.parse(result_handle)

with open(output_file, "w") as f:
    for blast_record in blast_records:
        for alignment in blast_record.alignments:
            for hsp in alignment.hsps:
                if hsp.expect< e_value_thresh:
                    f.write('****Alignment****' + "\n")
                    f.write('Subject Sequence:'+ str(alignment.title) +"\n")
                    f.write("Query Sequence:  " + str(blast_record.query) + "\n") #also added
                    f.write('length:'+ str(alignment.length) +"\n")
                    f.write('e value:'+ str(hsp.expect) +"\n")
                    f.write(hsp.query[0:75] + '...'+"\n")
                    f.write(hsp.match[0:75] + '...'+"\n")
                    f.write(hsp.sbjct[0:75] + '...'+"\n")

What I need to do is also screen out HSPs where the subject and query are the same sequence, since my BLAST query and database files are identical. I thought of changing:

if hsp.expect< e_value_thresh:

to

if hsp.expect< e_value_thresh and str(alignment.title) is not str(blast_record.query):

but this generated an error instead.

I would also like to write where the HSP match begins for the query and subject but

f.write('Subject Start:' + str(hsp.subjct_start) + "\n")

just gave an error saying that subjct_start was not an attribute of hsp

I'm still very new to Biopython and Python in general so I'm sure there is something very simple that I'm missing here, and I apologize if that's the case

BLAST Biopython Python • 3.9k views

ADD COMMENT • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by tyleraelliott ▴ 50

0

Entering edit mode

Please read through the documentation here: http://biopython.org/DIST/docs/api/Bio.Blast.Record-module.html This will give you all modules, classes and methods available, and their context.

ADD REPLY • link 3.1 years ago by Ram 45k

0

Entering edit mode

Right, that fixed my Start and End issue(dumb spelling mistake on my part, thanks!). I was viewing this document before and it did solve some other problems I was having but it wasn't able to answer how to screen out HSPs with the exact same query and subject.

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by tyleraelliott ▴ 50

0

Entering edit mode

Maybe compare gi from the names?

ADD REPLY • link 10.3 years ago by Ram 45k

Ram · Accepted Answer · 2015-05-20

I have solved it for anyone who was interested. It was a stupid mistake in how I wrote the inequality operator:

from Bio.Blast import NCBIXML
import sys

input_file=str(sys.argv[1])
output_file=str(sys.argv[2])
e_value_thresh=float(sys.argv[3])
result_handle=open(input_file)
blast_records=NCBIXML.parse(result_handle)

with open(output_file, "w") as f:
    for blast_record in blast_records:
        for alignment in blast_record.alignments:
            if str(alignment.hit_id) != str(blast_record.query): #this line eliminates HSPs with the same query and subject
                for hsp in alignment.hsps:
                    if (hsp.expect< e_value_thresh):
                        f.write('****Alignment****' + "\n")
                        f.write('Subject hit id: ' + str(alignment.hit_id) + "\n")
                        f.write('Subject Start:' + str(hsp.sbjct_start) + "\n")
                        f.write('Subject End:' +str(hsp.sbjct_end) + "\n")
                        f.write("Query Sequence:  " + str(blast_record.query) + "\n")
                        f.write("Query Start:" + str(hsp.query_start) + "\n")
                        f.write("Query End:" + str(hsp.query_end) + "\n")
                        f.write('length:'+ str(alignment.length) +"\n")
                        f.write('e value:'+ str(hsp.expect) +"\n")
                        f.write(hsp.query[0:75] + '...'+"\n")
                        f.write(hsp.match[0:75] + '...'+"\n")
                        f.write(hsp.sbjct[0:75] + '...'+"\n")