Hello people!
I have a tabular blast outputs in different files (many files) and i want to extract the best hit according to their bitscore and alignment length. I thought of using awk but I got confused on how the regular expression will be since the outputs are in different files and of different ranges. Then I decided to use a python script from bioman for Blast best hit and I am getting this error
Traceback (most recent call last):
File "BlastBesthit.py", line 17, in <module>
maxscore = bestHit.split()[11]
IndexError: list index out of range
Below is the code:
import string
import sys
import re
import os
blastdir = "/mnt/chisom/chisom/multifasta_files/multi_fasta/Blast_output"
os.chdir(blastdir)
readblastfile = os.listdir(blastdir)
for blastres in readblastfile:
#print blastres
Hits = open(blastres, 'r')
bestHit = Hits.read()
print bestHit
print len(bestHit)
maxscore = bestHit.split()[11]
query = bestHit.split()[0]
#print query
hits = Hits.readlines()
for hit in hits:
hitSplit = hit.split()
if query is not hitSplit[0]:
print bestHit
query = hitSplit[0]
maxscore = 0
bestHit = hit
elif maxscore < hitSplit[11]:
bestHit = hit
maxscore = hitSplit[11]
Hits.close()
A content of one file called blastres_4.2.3.52 is
4.2.3.52 Manes.12G148700.1 32.27 595 366 10 36 629 76 1752 2e-90 295
4.2.3.52 Manes.12G149100.1 32.50 597 362 12 36 629 76 1752 2e-90 295
4.2.3.52 Manes.12G076600.1 32.23 574 343 9 61 628 178 1779 2e-86 285
And the result I want to get is (since it is the same bitscore but different length)
4.2.3.52 Manes.12G148700.1 32.27 595 366 10 36 629 76 1752 2e-90 295
4.2.3.52 Manes.12G149100.1 32.50 597 362 12 36 629 76 1752 2e-90 295
I need all the best hits extracted to be in one file. please any help, correction and suggestion will be appreciated. Thanks
I am unable to find the script that you have provided the link to but from looking at the code it looks like it's the blastdir issue as it is hard-coded in the script. Either you can change that location to where the blast results are on your computer or you could try https://github.com/alunem/bioman/blob/master/blast/bmn-BlastBestHit.py code instead, where you will have to provide the location to the blast directory as command line argument.
Hello Sej, I hard-coded the directory into the script because in the directory contains many files which i have to loop through to get the best hits.
Does the directory only contain blast files, and nothing else? Otherwise, you will have problems.
Yes it contains only blast result files
You could create a new directory with all BLAST results file and change the path in the script to that folder as blastdir.