Question

Blastx No hit query?

1

Entering edit mode

8.4 years ago

Buffo ★ 2.4k

Hi guys, I`m doing blastx on linux command line, my output is on outmft 6 because i want to annotate the results, the problem is that in the current release of blastx the querys whithout hit does not appear on the table result, somebody know how get that sequences in other file? I have read the manual and it looks like there is no option for doing that, somebody have a perl or python script for that? or some ideas? Thanks everybody!!

blastx outfmt6 nohit • 2.1k views

ADD COMMENT • link 8.4 years ago by Buffo ★ 2.4k

1

Entering edit mode

use comm with your input+output to find the missing queries

ADD REPLY • link 8.4 years ago by Pierre Lindenbaum 164k

1

Entering edit mode

Mmm.. I`m not sure if that works with tabular format :-S

ADD REPLY • link 8.4 years ago by Buffo ★ 2.4k

1

Entering edit mode

extract the query name from both files. sort. compute the intersection with comm. And, yes, it works.

ADD REPLY • link 8.4 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

More complicated solution, but works perfect too :-) I posted a python script in other comment to do that. Thanks Pierre :-)

ADD REPLY • link 8.4 years ago by Buffo ★ 2.4k

score 2 · Answer 1 · 2016-07-13

2

Entering edit mode

8.4 years ago

Damian Kao 16k

Try this python script:

import sys
from Bio import SeqIO
faFile = open(sys.argv[1],'r')
queries = set([strrecord.id) for record in SeqIO.parse(faFile,'fasta')])
hits = set([x.split()[0] for x in open(sys.argv[2],'r').read().strip().split('\n')])
noHits = queries - hits
print '\n'.join(noHits)

Save as noHits.py and run by:

python noHits.py queries.fasta blastx.tab.output > noHits.ids

This script basically just gets a list of query ids and list of hit ids. Then it prints out ids of any query id that's not in the hit id list.

ADD COMMENT • link 8.4 years ago by Damian Kao 16k

0

Entering edit mode

Thank you very much Damian, there is a mistake:

queries = set([strrecord.id) for record in seqIO.parse(faFile,'fasta')]) ^ SyntaxError: invalid syntax

ADD REPLY • link 8.4 years ago by Buffo ★ 2.4k

0

Entering edit mode

the 's' in seqIO should be uppercase. I've edited the original post.

ADD REPLY • link 8.4 years ago by Damian Kao 16k

0

Entering edit mode

Still wrong, I posted new version that works in other comment, but thank you very much Damian :-)

ADD REPLY • link 8.4 years ago by Buffo ★ 2.4k

score 0 · Answer 2 · 2016-07-14

It's basically like this:

export LC_ALL=C; comm -3 <(grep "^>" file.fasta | sort) <(cut -f1 blast.tsv | sort)

It might not work because blast may report shorter query sequence IDs than full headers. In this case, I would need to see a few headers and blast output lines so the first process substitution could be modified accordingly.

GenoMax · Answer 3 · 2016-07-14

It works perfect;

ussage python python_script.py file1 file2

from __future__ import division
import sys
from Bio import SeqIO
import os

file1 = open(sys.argv[1],'r')
lista1 = []

for line in file1:
        line = line.rstrip('\n')
        lista1.append(line)
file1.close()

file2 = open(sys.argv[2],'r')
lista2 = []

for line in file2:
        line = line.rstrip('\n')
        lista2.append(line)
file2.close()

same = set(lista1).intersection(set(lista2))

file_out = open('some_output_file.txt', 'w')
for line in same:
    file_out.write(line)

file_out.close()