Parsing result from NCBI tblastn + to text file in biopython
1
0
Entering edit mode
8.9 years ago

I am trying to create a script using biopython to parse the HIT_def identifier from NCBI tblastn. Not sure how to get the HIT_def into python. Attached is the xml and python code

https://www.dropbox.com/s/89qazoghvfn6yjp/conesnail.xml?dl=0

result_handle = open("/Users/XCX/conesnail.xml")
from Bio.Blast import NCBIXML
blast_records = list(NCBIXML.parse(result_handle))
E_VALUE_THRESH = 0.01
for hit in "hit_def:"
    if hsp.expect < E_VALUE_THRESH:
        print('Identifier:', alignment.title)
BLAST ncbi python biopython • 3.0k views
ADD COMMENT
0
Entering edit mode

Its great you've provided a sample input file, but what do you want exactly as the output?

ADD REPLY
0
Entering edit mode

The target ID maybe what I want. I have a list of peptides converted from the fasta reference file and each has an ID that matches the "Hit_def" attribute in the xml file that is output by tBLASTn. I want to get that attribute and print it to a text file so that ID can be compared to an excel spreadsheet containing all the peptides.

ADD REPLY
0
Entering edit mode
8.9 years ago
Jon ▴ 360

You can use SearchIO to parse the results, read the docs here for Blast XML: http://biopython.org/DIST/docs/api/Bio.SearchIO.BlastIO-module.html

Something like this will allow you to get the query id and then the target id (if that is what you want?):

from Bio import SearchIO
E_VALUE_THRES = 0.01
with open('conesnail.xml', 'rU') as input:
    for qresult in SearchIO.parse(input, "blast-xml"):
        hits = qresult.hits
        query_id = qresult.id
        if len(hits) > 0:
            target_id = hits[0].id
            evalue = hits[0].hsps[0].evalue
            if evalue < E_VALUE_THRES:
                print("%s\t%s" % (query_id, target_id))
ADD COMMENT
0
Entering edit mode

Last thing, how would I also parse the protein alignment?

ADD REPLY

Login before adding your answer.

Traffic: 2196 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6