There are several pieces of information we are missing here. Did you specify that all 96 threads be used? If so, are all of them being used when you look at the system load? What type of disk do you have: a regular hard disk or a solid state disk? Is it internal or external?
The last two questions are about the speed of reading and writing, which may be a limiting factor here. If you have a solid state disk and let's say only half the threads are engaged, splitting your sequence in two might speed things up. I caution you against trying to split into too many parts. Even with the fastest disk, especially if doing 5+ operations of both type simultaneously, the speed of reading and writing will become a choking point.
It also matters whether you are writing full alignments or just scores in tabular format, the latter being much faster. It also matters how many top scores are being recorded - it takes longer to calculate and write 500 alignments or 500 tabular scores than only 5 of them.
Here is something you should keep in mind. Let's say that an average search time is 1 second per query. A sequence that is 20 residues long might take less, but a sequence >10,000 residues will take much longer. Under that assumption it would take you 1000 hours to do this search, which means that your search will conclude at the end of June. Even if it takes only half a second per sequence, it will still take another two weeks.
Several suggestions: 1) create a non-redundant database of your queries at 90% identity, which might cut the size in almost half; 2) do the same for your target database; 3) get an access to a cluster where you can run this search on multiple nodes, in which case you can split your sequence into 10 parts because they will be using different threads and different disks; 4) ask that only a minimal informative number of scores/alignments be displayed in the output (if you are interested only in the top 5 hits, then ask only for top 5 to be displayed).
If you can't do any of the above, I suggest you find something else to occupy your attention for the next month or so.
Thank you for your quick responce.
Sorry for the lack of explanation. Here are some supplementary details regarding the parameters of Diamond Blastp.We've specified the following options:
--threads 96 --query-cover 70 --min-score 100 --max-target-seqs 1 --outfmt 6
Furthermore, the job was submitted using Slurm with sbatch -c 96.
I used
seqkit split
to divide the query FASTA file into 10 parts and then randiamond blastp
on each file using 44 threads. The process finished in about 2 hours. I'm not entirely sure why it was so fast, but I wanted to share this. (Perhaps there was some issue when handling a large number of queries at once previously?) Thank you.