Script to run blast locally with multiple files in a directory as queries
2
0
Entering edit mode
7.5 years ago
anicet.ebou ▴ 170

Hi everyone,

I have searched for a script allowing me to run blast locally on multiple fasta files contain in a directory. I found out this one line bash script, but it throws me an error when doing the job:

find . -type f -exec blastp -query '{}' -db swissprot -out '{}'_blastp.fas \;

Warning: [blastp] Query is Empty !

I want a solution to avoid warning when doing this stuff. I'm working on linux 16.04, running blast through terminal.

Thanks in advance.

blast • 10k views
ADD COMMENT
3
Entering edit mode
7.5 years ago

I loe GNU parallel for such things. Something like

ls *.fasta | parallel -a - blastp -query {} -db swissprot --out {.}.out

since it allows to do it in parallel for many jobs

ADD COMMENT
0
Entering edit mode

i have got this output running your code parallel: invalid option -- 'a' parallel [OPTIONS] command -- arguments for each argument, run command with argument, in parallel parallel [OPTIONS] -- commands run specified commands in parallel

ADD REPLY
0
Entering edit mode

+1 thanks, this is indeed an interesting alternative. Can you please let me know how do I get it for CentOS? Is it inbuilt or shall I do a yum install

ADD REPLY
0
Entering edit mode

Please refer to https://www.gnu.org/software/parallel/parallel_tutorial.html

(wget -O - pi.dk/3 || curl pi.dk/3/ || \
   fetch -o - http://pi.dk/3) | bash

Usualy it is part of your dist and I have seen it on CentOS dist as well

ADD REPLY
2
Entering edit mode
7.5 years ago

A simple for loop should be enough!

# considering your query file extension is ".fasta"

for i in *.fasta; do
name=`echo $i | awk -F "." '{print $1}'`
blastp -query $i -db swissprot -out ${name}.out
done
ADD COMMENT
0
Entering edit mode

Can we use cut -f 1 -d "." instead of awk -F "." '{print $1}' ?

name=$(echo $i | cut -f 1 -d ".")
ADD REPLY
0
Entering edit mode

Why not, does the same thing!

ADD REPLY
0
Entering edit mode

Your script seems to not work as i want. it run only one file in my directory as query and the name the output file is note formatted as needed or the output file doesn't appear at all !

I want to precise that my code works perfectly but throws warnings and my purpose i just to have a new script or find a way with my script to eliminate these warnings. Thanks @Vijay Lakhujani

ADD REPLY
0
Entering edit mode

change following line in Vijay's code: From

blastp -query $i -db swissprot -out ${name}.out

To

blastp -query $i -db swissprot -out ${name}_blastp.fas

Run the code after modification and let us know if it is precise. By the way, how many fasta files do you have in your directory (i.e files with .fasta extension)? What is the extension of fasta files in your directory (.fa or .fasta) or they zipped?

ADD REPLY
0
Entering edit mode

This code have the same output as Vija's code. i have 50 fasta files with .fas extension.

ADD REPLY
0
Entering edit mode

I could not understand why this should not work. This is a very basic and regular task. As mentioned by cpad0112, let me know the file extension of your fasta files.

and run ls command and share the output so that we can see the files you have in your directory. Also, share the error message if any.

Last but not the least, we assume that you have the correct paths for executing blastp and for the swissprot database.

ADD REPLY
0
Entering edit mode

i've got no error message but the output is not convenient.

 ediman@ediman-HP-Notebook:~/all pep$ ls

Allpep_subset_00.fas Allpep_subset_18.fas Allpep_subset_36.fas Allpep_subset_01.fas Allpep_subset_19.fas Allpep_subset_37.fas Allpep_subset_02.fas Allpep_subset_20.fas Allpep_subset_38.fas Allpep_subset_03.fas Allpep_subset_21.fas Allpep_subset_39.fas Allpep_subset_04.fas Allpep_subset_22.fas Allpep_subset_40.fas Allpep_subset_05.fas Allpep_subset_23.fas Allpep_subset_41.fas Allpep_subset_06.fas Allpep_subset_24.fas Allpep_subset_42.fas Allpep_subset_07.fas Allpep_subset_25.fas Allpep_subset_43.fas Allpep_subset_08.fas Allpep_subset_26.fas Allpep_subset_44.fas Allpep_subset_09.fas Allpep_subset_27.fas Allpep_subset_45.fas Allpep_subset_10.fas Allpep_subset_28.fas Allpep_subset_46.fas Allpep_subset_11.fas Allpep_subset_29.fas Allpep_subset_47.fas Allpep_subset_12.fas Allpep_subset_30.fas Allpep_subset_48.fas Allpep_subset_13.fas Allpep_subset_31.fas Allpep_subset_49.fas Allpep_subset_14.fas Allpep_subset_32.fas Allpep_subset_50.fas Allpep_subset_15.fas Allpep_subset_33.fas Allpep_subset_51.fas Allpep_subset_16.fas Allpep_subset_34.fas Allpep_subset_52.fas Allpep_subset_17.fas Allpep_subset_35.fas

ADD REPLY
0
Entering edit mode

In Vijay's code, change from:

for i in *.fasta; do

to

for i in *.fas; do

Run the code and let us know.

or modify Petr's code from

ls *.fasta | parallel -a - blastp -query {} -db swissprot --out {.}.out

to

ls *.fas | parallel -a - blastp -query {} -db swissprot --out {.}.out
ADD REPLY

Login before adding your answer.

Traffic: 1707 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6