Python 3.4, Biopython 1.65:Screening out HSPs in .xml BLAST output where the subject and query are the same
1
0
Entering edit mode
9.6 years ago

I have BLAST output in .xml format that I am parsing for certain features and writing them to a file. Here is the code:

from Bio.Blast import NCBIXML

import sys

input_file=str(sys.argv[1])
output_file=str(sys.argv[2])
e_value_thresh=float(sys.argv[3])
result_handle=open(input_file)
blast_records=NCBIXML.parse(result_handle)

with open(output_file, "w") as f:
    for blast_record in blast_records:
        for alignment in blast_record.alignments:
            for hsp in alignment.hsps:
                if hsp.expect< e_value_thresh:
                    f.write('****Alignment****' + "\n")
                    f.write('Subject Sequence:'+ str(alignment.title) +"\n")
                    f.write("Query Sequence:  " + str(blast_record.query) + "\n") #also added
                    f.write('length:'+ str(alignment.length) +"\n")
                    f.write('e value:'+ str(hsp.expect) +"\n")
                    f.write(hsp.query[0:75] + '...'+"\n")
                    f.write(hsp.match[0:75] + '...'+"\n")
                    f.write(hsp.sbjct[0:75] + '...'+"\n")

What I need to do is also screen out HSPs where the subject and query are the same sequence, since my BLAST query and database files are identical. I thought of changing:

if hsp.expect< e_value_thresh:

to

if hsp.expect< e_value_thresh and str(alignment.title) is not str(blast_record.query):

but this generated an error instead.

I would also like to write where the HSP match begins for the query and subject but

f.write('Subject Start:' + str(hsp.subjct_start) + "\n")

just gave an error saying that subjct_start was not an attribute of hsp

I'm still very new to Biopython and Python in general so I'm sure there is something very simple that I'm missing here, and I apologize if that's the case

BLAST Biopython Python • 3.6k views
ADD COMMENT
0
Entering edit mode

Please read through the documentation here: http://biopython.org/DIST/docs/api/Bio.Blast.Record-module.html This will give you all modules, classes and methods available, and their context.

ADD REPLY
0
Entering edit mode

Right, that fixed my Start and End issue(dumb spelling mistake on my part, thanks!). I was viewing this document before and it did solve some other problems I was having but it wasn't able to answer how to screen out HSPs with the exact same query and subject.

ADD REPLY
0
Entering edit mode

Maybe compare gi from the names?

ADD REPLY
1
Entering edit mode
9.5 years ago

I have solved it for anyone who was interested. It was a stupid mistake in how I wrote the inequality operator:

from Bio.Blast import NCBIXML
import sys

input_file=str(sys.argv[1])
output_file=str(sys.argv[2])
e_value_thresh=float(sys.argv[3])
result_handle=open(input_file)
blast_records=NCBIXML.parse(result_handle)

with open(output_file, "w") as f:
    for blast_record in blast_records:
        for alignment in blast_record.alignments:
            if str(alignment.hit_id) != str(blast_record.query): #this line eliminates HSPs with the same query and subject
                for hsp in alignment.hsps:
                    if (hsp.expect< e_value_thresh):
                        f.write('****Alignment****' + "\n")
                        f.write('Subject hit id: ' + str(alignment.hit_id) + "\n")
                        f.write('Subject Start:' + str(hsp.sbjct_start) + "\n")
                        f.write('Subject End:' +str(hsp.sbjct_end) + "\n")
                        f.write("Query Sequence:  " + str(blast_record.query) + "\n")
                        f.write("Query Start:" + str(hsp.query_start) + "\n")
                        f.write("Query End:" + str(hsp.query_end) + "\n")
                        f.write('length:'+ str(alignment.length) +"\n")
                        f.write('e value:'+ str(hsp.expect) +"\n")
                        f.write(hsp.query[0:75] + '...'+"\n")
                        f.write(hsp.match[0:75] + '...'+"\n")
                        f.write(hsp.sbjct[0:75] + '...'+"\n")
ADD COMMENT

Login before adding your answer.

Traffic: 2542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6