I am hosting a local blastx (v 2.2.31+) server on a 20 core machine with 128GB of RAM running CentOS 6. I am running a relatively large series of jobs (e.g. 1 job may have ~10^6 samples with on average ~100 nucleotides per sample, but with some samples up to 3000 nucleotides). Unsurprisingly, these runs take on the order of days to complete. I am blasting against the "nr" database.
I submit my job to the machine via SGE and specify num_threads
= 20. When I watch the memory usage of blastx, it seems that it is only ever using 4 - 10 GB of memory and it varies between the two limits. The "nr" database is ~70GB. I would think that blastx should be using on the order of 70GB, not 4-10GB.
This makes me think that it is swapping the database in and out of memory as it runs, which is time consuming and inefficient.
QUESTION : is there a way to force blastx to load the entire database (since I have RAM to do so)? Or am I misunderstanding the problem?
ulimit -a
shows that max memory size is unlimited andBATCH_SIZE
is not set. Thanks for the links, I will be sure to look at them.