Question

Parsing result from NCBI tblastn + to text file in biopython

0

Entering edit mode

9.6 years ago

onspotproductions ▴ 150

I am trying to create a script using biopython to parse the HIT_def identifier from NCBI tblastn. Not sure how to get the HIT_def into python. Attached is the xml and python code

https://www.dropbox.com/s/89qazoghvfn6yjp/conesnail.xml?dl=0

result_handle = open("/Users/XCX/conesnail.xml")
from Bio.Blast import NCBIXML
blast_records = list(NCBIXML.parse(result_handle))
E_VALUE_THRESH = 0.01
for hit in "hit_def:"
    if hsp.expect < E_VALUE_THRESH:
        print('Identifier:', alignment.title)

BLAST ncbi python biopython • 3.3k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 9.6 years ago by onspotproductions ▴ 150

0

Entering edit mode

Its great you've provided a sample input file, but what do you want exactly as the output?

ADD REPLY • link 9.6 years ago by Peter 6.0k

0

Entering edit mode

The target ID maybe what I want. I have a list of peptides converted from the fasta reference file and each has an ID that matches the "Hit_def" attribute in the xml file that is output by tBLASTn. I want to get that attribute and print it to a text file so that ID can be compared to an excel spreadsheet containing all the peptides.

ADD REPLY • link 9.6 years ago by onspotproductions ▴ 150

Ram · Answer 1 · 2015-12-22

0

Entering edit mode

9.6 years ago

Jon ▴ 360

You can use SearchIO to parse the results, read the docs here for Blast XML: http://biopython.org/DIST/docs/api/Bio.SearchIO.BlastIO-module.html

Something like this will allow you to get the query id and then the target id (if that is what you want?):

from Bio import SearchIO
E_VALUE_THRES = 0.01
with open('conesnail.xml', 'rU') as input:
    for qresult in SearchIO.parse(input, "blast-xml"):
        hits = qresult.hits
        query_id = qresult.id
        if len(hits) > 0:
            target_id = hits[0].id
            evalue = hits[0].hsps[0].evalue
            if evalue < E_VALUE_THRES:
                print("%s\t%s" % (query_id, target_id))

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 9.6 years ago by Jon ▴ 360

0

Entering edit mode

Last thing, how would I also parse the protein alignment?

ADD REPLY • link 9.6 years ago by onspotproductions ▴ 150