I have a BLASTED file as tabular with comment lines. I have to find the total hits. I will appreciate any help for a python script.
I have a BLASTED file as tabular with comment lines. I have to find the total hits. I will appreciate any help for a python script.
Here's your Python script:
from itertools import groupby
fh = open('BLASTED.txt')
oh = open('BLASTED.txt.out', 'w')
queries_no = 0
hits_no = 0
for qid, hsps in groupby(fh, lambda l: l.split()[0]):
if qid.startswith('#'): continue
hits = len(set([l.split()[1] for l in hsps]))
hits_no += hits
queries_no += 1
oh.write('{0}\t{1}\n'.format(qid, hits))
oh.close()
fh.close()
print 'Total queries :', queries_no
print 'Total hits :', hits_no
print 'Averaged hits :', float(hits_no)/queries_no
The average number of hits is of very limited value, especially if the number of hits to display was restricted in the blast search.
To do this in practice on a tabular output is very simple: each line represents a hit, therefore you can count the occurrences of each unique query id (sequence_A := 5) and divide by the total number of query ids (1).
Thank you very much a.zielezinski. The script works perfect !!
I have to consider one hit per subject sequence. All of this using the BLASTED txt file.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Please make yourself familiar with the EXACT meaning of the columns in your blast output file. The main problem with blast output is to decide if a hit (a row in your table) is relevant for your project. For example, the second row in your table indicates a short sequence repeat within 'sequence_A'. Is this relevant or not?