Hi everyone,
I am trying to parse a blast result produced using outfmt 6 option.
I have made several tries, with iterators, without iterators... But each time it fails to parse my file.
Here some code that I try :
parser = argparse.ArgumentParser() parser.add_argument("blast_file", help="The path of the file containing blast result in xml format") args = parser.parse_args()
results = open(args.blast_file, "r")
blast_parser = NCBIStandalone.BlastParser() blast_records = blast_parser.parse(results)
for blast_record in blast_records:
E_VALUE_THRESH = 0.0004
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
if hsp.expect < E_VALUE_THRESH:
print('****Alignment****')
print('sequence:', alignment.title)
print('length:', alignment.length)
print('e value:', hsp.expect)
if len(hsp.query) > 75:
dots = '...'
else:
dots = ''
print(hsp.query[0:75] + dots)
print(hsp.match[0:75] + dots)
print(hsp.sbjct[0:75] + dots)
But then, it showed this error :
python parse_last_hit.py /media/loutre/SUZUKII/assembly/duplication_removal/2017/Blast/Contig_37_orf.txt
/usr/lib/python2.7/dist-packages/Bio/Blast/NCBIStandalone.py:57: BiopythonDeprecationWarning: This module has been deprecated. Consider Bio.SearchIO for parsing BLAST output instead.
"parsing BLAST output instead.", BiopythonDeprecationWarning)
/usr/lib/python2.7/dist-packages/Bio/ParserSupport.py:29: BiopythonDeprecationWarning: Bio.ParserSupport is now deprecated will be removed in a future release of Biopython.
"future release of Biopython.", BiopythonDeprecationWarning)
Traceback (most recent call last):
File "parse_last_hit.py", line 14, in <module>
blast_records = blast_parser.parse(results)
File "/usr/lib/python2.7/dist-packages/Bio/Blast/NCBIStandalone.py", line 836, in parse
self._scanner.feed(handle, self._consumer)
File "/usr/lib/python2.7/dist-packages/Bio/Blast/NCBIStandalone.py", line 118, in feed
read_and_call_until(uhandle, consumer.noevent, contains='BLAST')
File "/usr/lib/python2.7/dist-packages/Bio/ParserSupport.py", line 320, in read_and_call_until
line = safe_readline(uhandle)
File "/usr/lib/python2.7/dist-packages/Bio/ParserSupport.py", line 400, in safe_readline
raise ValueError("Unexpected end of stream.")
ValueError: Unexpected end of stream.
By googling it, I found that it may be a problem of the blast format issue (Problems With Biopython When Running The Ncbistandalone.Py Program)
I really don't want to go through XML, I can't allow it because I have a lot of Blast with huge sequences to do. Producing xml takes too much time and too much storage.
Does anyone know a way using Biopython to parse through blast result in tabular format ? Thanks a lot for your answers !
Have you tried using
SearchIO
with the blast tabular output? It's pretty straighforward. e.g:Also, you can treat the
outfmt 6
as just a plain delimited text file, so you don't actually need to use a specialised parser at all necessarily (e.g: https://github.com/MicroInfect/bioinfx/blob/master/blastfilterer.py )SearchIO is a nice suggestion, though my blast result will contain a lot of raws, and the function read can work only with blast output with exactly one result.
But searchIO has also a parse function, which handle multi result blast file, but when I tried it didn't worked. I'm trying again see if you can understand the error.
Ah right I remember, parse from searchIO support only xml so that's why I'm not happy with it
About doing the parser myself without biopython, I already thought about that, But I was stucked at the part where I had to read powered numbers with python...
Like when you have something like this : 5e-43, how do you store this in python ? It's been a while since I didn't worked in python...
You can just use float("5e-43") to get your values converted.
Really ? Ow damn, I was stucked with this question for so long... Thanks a lot then I'm going to try ! it sounds dumb now