Commandline BLAST - errors?
1
0
Entering edit mode
3.3 years ago
bry.th • 0

Hi,

I'm running command line blastx and blastp against a number of databases. However, running the exact same script on the exact same input files against the exact same databases occasionally seems to output different filesizes.

I can only assume that this is because the task is failing - but how can I QC the outputs to tell which output file was successful? I know I can look at the log files, but some of these I am inheriting and I want to be able to double check if they were successful.

The outputs tend to be outfmt6 or asn then converted to asn (I'm following a specific methodology, this isn't my choice)

blastx \
-query /input-file.fa \
-db nr \
-num_threads 24 \
-max_target_seqs 10 \
-outfmt 11 \
-out output-file.asn

blast_formatter \
-archive output-file.asn \
-outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore" > \
output-file.outfmt6

Thanks!

line BLAST RNA-seq command • 1.7k views
ADD COMMENT
1
Entering edit mode

May be you can fetch first column and uniq it to see how many sequences have got hits in both the files. Like this:

cut -f1 output-file.outfmt6 | sort | uniq | wc -l

Further, if you need, you can take it's output in file and can compare both files using "diff" command to check for the difference in sequence IDs covered in both the files.

ADD REPLY
0
Entering edit mode

Hi,

Thanks for this but I don't understand how it helps (that's an indictment of me not your help!)

You say I should use the code to see how many get hits in both the files, but there's only one input file. Assuming one file comes out with more, is that the file I should be using?

ADD REPLY
1
Entering edit mode

I might be wrong here, but what I mean is when you run blastp for 2 times and getting different file size, there might be the possibility that, in one of the file, one or more sequences might have missed/skipped. In this case you can use the above command and select file having more number of sequence IDs.

Also, you can convert .asn file to fmt 5 or 16, to see whether all the sequences given as input where processed by blast or not as these format gives information of all the sequences irrespective of whether there is hit obtained or "No hits found". For eg. in case of fm5, you can use:

grep -c "<Iteration_query-def>"  filename.xml

Or if you want to just know the sequences having hits each time you do blast, you can use below command for fmt 5:

 grep -c "<Hit_num>1</Hit_num>"  filename.xml 

In this way you can know that whenever you do blast, number of sequences processed and having hits remains same or not.

ADD REPLY
1
Entering edit mode
3.3 years ago

You could check the exit status of blast, at least blastx and blast_formatter report proper exit codes.

blastx \
-query /input-file.fa \
-db nr \
-num_threads 24 \
-max_target_seqs 10 \
-outfmt 11 \
-out output-file.asn 2> blastx_STDERR

return_value=$?
if [ ${return_value} -ne 0 ]; then
    echo "Error in blastx run!"
    echo "Error message was:"
    cat STDERR
fi

blast_formatter \
-archive output-file.asn \
-outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore" > \
output-file.outfmt6 2> STDERR

return_value=$?
if [ ${return_value} -ne 0 ]; then
    echo "Error in blast_formatter run!"
    echo "Error message was:"
    cat STDERR
fi
ADD COMMENT
0
Entering edit mode

Thanks I'll give this a try

ADD REPLY

Login before adding your answer.

Traffic: 1815 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6