Blast script over multiple databases
2
0
Entering edit mode
5.7 years ago
dllopezr ▴ 130

Hi community

I have a folder with multiple blast databases. I want to run blastn over all databases and produce one output for each database.

I'm trying something like that

for i in `find . -name 'name_of_database'`; do

time blastn -db "$i" -query Sondas_100.fasta -out "$i".out  -outfmt 7 -num_threads 16 -dust yes -ungapped

done

But this options search for a filename, and the blast databases are alias

Any help with that?

Thank you so much

blastn script multipledatabases • 1.8k views
ADD COMMENT
1
Entering edit mode

Do you need an output per DB or will one output over all DBs do it as well?

What exactly do you mean with 'aliases'?

ADD REPLY
0
Entering edit mode

Hi Lieven

When I say "aliases" I refer to the name of the blast database is not a file but a name that represents the files.

Example: The makeblastdb produces 3 files with names: T1P1T0.nhr, T1P1T0.nsq and T1P1T0.nal, but the name of the blast database to pass to blastn script is only T1P1T0 without its extensions.

And yeah! I want a otuput for each database

ADD REPLY
0
Entering edit mode
5.7 years ago
dllopezr ▴ 130

I already do it

Because my databases have this name structure: T"x"P"x"_T"x" when x is a 1 to 4 number

I create all the strings and passed it in the blast command

 #!/bin/bash

    for T in `seq 1 4`; do
            for P in `seq 1 4`; do
                    for t in `seq 0 3`; do
                            time blastn -db /vault2/homehpc/jmalagont/dllopezr/Shotgun_Seq/Trimmed_Seqs/FastaSeqs12/$"T"$T"P"$P"_T"$t"_R1" -query Sondas_100.fasta -out ""T"$T"P"$P"_T"$t"_R2"".out  -outfmt 7 -num_threads 16 -dust yes -ungapped
    done
    done
    done
ADD COMMENT
1
Entering edit mode

yes this will work (and congrats for solving it), but do consider using jrj.healey approach as that one is much more omni-applicable!

ADD REPLY
2
Entering edit mode
5.7 years ago
Joe 21k

All you need to do is strip the extension off the result of your find command:

e.g.

for i in $(find . -name 'name_of_database.nhr') ; do
  database="${i%.*}"
  time blastn -db "$database" -query Sondas_100.fasta -out "$i".out  -outfmt 7 -num_threads 16 -dust yes -ungapped
done

Add the extension in the actual find command to ensure it only finds each database once, rather than once per related file, then strip the extension off, and pass the new path which should correspond to the basename of the database.

*Not tested

ADD COMMENT

Login before adding your answer.

Traffic: 1544 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6