BLAST multiple database against each other
1
0
Entering edit mode
6.2 years ago

Hi,

I am trying to do BLAST analysis. I have 9 different database (each database total size 4GB); and I want to do the BLAST analysis against all 9 databases. I am trying to find the best hit from each. As, I tried standalone blast but I am unable to get any output because still its running (~20 days)

Any one can suggest some other tool or software to solve this problem?

Any help is much appreciated.

Thanks

alignment • 3.1k views
ADD COMMENT
1
Entering edit mode

What kind of BLAST searches is being performed? If it is BLASTP or BLASTX then you can use DIAMOND as it is much faster than BLAST.

ADD REPLY
0
Entering edit mode

Thanks. I am trying to do blastn.

ADD REPLY
1
Entering edit mode

Give details. What sort of database are you talking about? What is the query ?

ADD REPLY
0
Entering edit mode

Its all different miRNA databases. I am trying to find the best hits among all with respect to each other.

ADD REPLY
0
Entering edit mode

if you're running this on a single core I'm not surprised

but do provide detail indeed as both Sej Modha and Antonio R. Franco point out

ADD REPLY
0
Entering edit mode

Thanks but, I am not using single core. Its 100 core on server.

ADD REPLY
0
Entering edit mode

from a post below, we already determined that this is not the case and you effectively run your job on a single core. To run multi core blast you need to specify the -num_threads on the cmdline (if you don't you get the default and that is 1 )

ADD REPLY
1
Entering edit mode
6.2 years ago
GenoMax 147k

You can use blastdb_aliastool to create a single blast database alias from all 9 databases, which you can then use for blast.

As, I tried standalone blast but I am unable to get any output because still its running (~20 days)

If you don't see any output from the search it is likely that the process is hung. Is there an output file that you see which is growing in size or is it empty?

ADD COMMENT
0
Entering edit mode

Thanks but out file is empty and I think still its running.

ADD REPLY
0
Entering edit mode

If it has not produced a single line of output after 20 days (with 100 cores) it is unlikely that blast is running (or at least productively). You should stop and restart the search. How long are your query sequences and what exactly is in your target databases?

ADD REPLY
0
Entering edit mode

as other have pointed out, you should in any case get a least a few lines of output (=blast output header and some other info) within the fist minutes of running the blast , if oyu don't then something is indeed wrong.

Can you post the blast cmdline you are trying to execute?

ADD REPLY
0
Entering edit mode

I am trying to run this command on server

#!/bin/bash
# Sample Slurm Script for use with OpenMPI on Plato
# Begin Slurm directives with #SBATCH
#SBATCH --job-name=multidb_test2_$1
#SBATCH --nodes=10
#SBATCH --tasks-per-node=10
#SBATCH --cpus-per-task=1
#SBATCH --time=480:00:00
#SBATCH --mem=16G
#SBATCH --output=multidb_test2.%J.out
#SBATCH --error=multidb_test2.%J.err

#for i in $(seq 1 8); do ../makeblastdb -in $i\_*.fa -input_type fasta -dbtype nucl -title $i\_db -out $i\_db; done

for i in $(seq 6 8); do ../blastn -query $i\_*serial.fa -db 5_db -qcov_hsp_perc 90 -outfmt 6 | sort -k1,1 -k12,12nr -k11,11n | sort -u -k1,1 --merge > $i\_5db_hits.blastn; done

Please let me know how to get it done perfectly.

Thanks

ADD REPLY
0
Entering edit mode

will have to look in detail but on first sight the use of the * wildcard will not work in this command line. Blast assumes a single fasta file as input (both for the makeblastdb and the blastn) . Why are you using it? Or what do you think to achieve with it?

Moreover, I think you're trying to do this over complicated.

ADD REPLY
0
Entering edit mode

Thanks for your reply. As, I tried "*" in makeblastdb to make multiple databases at a time and it worked. Regarding complicated analysis... yes it is that is why I was trying to put in loop to get the result. As, it seems not working so I split the fasta file of all databases and added in loop. And its working.

I can't understand why it was not working in big files.

ADD REPLY
0
Entering edit mode

Have you tested to make sure the databases have actually been properly made? Just because the command completed (did you check the log files) there is no guarantee that all is ok. Before you jump into a large job like this it is always best to test with a file or two to see if things are working ok. It is also a bad idea to put a for loop inside a SLURM job. Here is one example of how you may run these jobs.

for i in $(ls *serial.fa | sed 's/.fa//'); do echo sbatch -n 10 -N 1 --time=480:00:00 --mem=16G --output=multidb_test2.%J.out --error=multidb_test2.%J.err --wrap="../blastn -Num_threads 10 -query $i.fa -db 5_db -qcov_hsp_perc 90 -outfmt 6 | sort -k1,1 -k12,12nr -k11,11n | sort -u -k1,1 --merge > $i_5db_hits.blastn"; done

If the commands look sane then remove the word echo to actually submit the jobs.

ADD REPLY
0
Entering edit mode

thanks for the insights on the slurm part genomax , I'm not familiar with slurm myseslf.

and well spotted that OP forgot to add the -Num_threads 10 , the blast was thus running on a single core, though the slurrm job requested 100. That still does not explain however that no output was provided.

ADD REPLY
0
Entering edit mode

My hunch is that the databases have not been made properly hence there is no output (besides other problems with the command line you noted).

ADD REPLY
0
Entering edit mode

Thanks, but already i tested with small sequence before putting them in loop.

ADD REPLY
0
Entering edit mode

Thanks, yes i tested all databases before putting them in this big for loop. Till date only i can think the memory issue. Because for small seqeuences i am getting the results.

Thanks all for your suggestions

ADD REPLY
0
Entering edit mode

For the use of the " * ": I tested it myself and it will only work if there is only a single file that will match, if that is the case then that's fine. Along the same line: will there be only a single file matching $i\_*serial.fa ?

I'm still struggling with the loop for doing the blast itself. Why are you doing all those sorts of the output? And the given cmdline will only report the result of 3 input files against a db called 5_db, is that the complete cmdline you execute?

ADD REPLY
0
Entering edit mode

Yes, only one database will match with another 8 one by one.

ADD REPLY
0
Entering edit mode

Still not convinced it's all correct what you are doing.

As been pointed out before: you can only gain by using the create alias for DBs approach. A cmdline as follows would achieve that already:

blastdb_aliastool -dblist "1_db 2_db 3_db 4_db 5_db 6_db 7_db 8_db" -dbtype nucl -out all_db -title "all subDB"

this will give you a single DB to use in your blast cmdline, so no need anymore to loop over all your DBs. Moreover it is generally not a good idea to split up your DB, and the way you are running it would make it nearly impossible to compare the score of hits between the diff DBs as they are specific for each query-DB search.

You can split up the input query file of you blasts, that's totally fine

It also seems to me that there is no point in sorting the output from the blast in your case as this is more-or-less the already sorted tabular output blast provides.

ADD REPLY
0
Entering edit mode

Sorry, may be I was unable to explain but I splited the query file not the DB. Already, I tried blastdb_aliastool but still waiting for result.

I am sorting the BLAST result to get the exact matched sequences only.

Thanks for your valuable suggestions.

ADD REPLY
0
Entering edit mode

you totally lost me now ...

Didn't you mention you created (or had) different DBs? What part of the blastdb_aliastool are you waiting for (that part itself should run instantaneously) or is it for the blast itself?

ADD REPLY
0
Entering edit mode

Dear, I think I already mentioned in my question that I have 9 different databases. Things which I already tried 1. I created 9 different databases using makeblastdb. And tried the loop for 1 database sequences blast with respect to other using loop. (~20 days ran but no result) 2. Tried the blastalias tool as well and waiting for some output from last few days. 3. I splited the query file and did blast with respect to single database; I am getting the result.

Please suggest something suitable for this query.

Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1634 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6