This may be real simple, but I'm having memory issues with running tr2aacds.pl evidential gene pipeline. I am running through a PBS queue set-up on the server. My node has 256GB RAM and 64CPU. I keep getting an error saying PBS killed my job because memory allocation was exhausted....Script below I am running.
As you can see from the above there is a MAXMEM (flag is in MB) setting and also an NCPU setting. For these I have set low memory allocations as you see above (as a buffer so that the job isn't killed again), as I was previously trying to run all 250GB and 60 cores of my node. Despite minimising the memory and cores like the above script, still it chews up all the memory and dies. It seems to fail during the Blast stage of the pipeline. Any ideas where I'm going wrong, or what's happening?
Evigene uses NCBI blastn with ncpu processes, for an all-cds x all-cds blastn run. That can eat up more memory than you think, if your transcript set is large (many millions, which is the recommended way), and has many near-identical coding sequences. The answer to this problem is reduce ncpu, until it runs to finish. This tr2aacds is efficient in its data reduction and will generally finish large, 10M sets of transcripts in a few hours with 12 to 24 cores, on machines with 64GB to 128 GB memory. The more recent Evigene tr2aacds version now checks for failed blastn parts, and reruns them (if not too many failed).
At http://arthropods.eugenes.org/EvidentialGene/other/evigene_old/
evigene16mar20.tar is most recent public version.
This part of Evigene also uses cd-hit-est prior blastn, which respects that MAXMEM parameter. However, the blastn portion doesn't have a memory setting, only the -ncpu setting to tune it to your system, so divide available memory by ncpu to estimate amount per blastn process, where needs will increase with size of your transcript set and near identity of transcripts.
If you have 250G available why are you setting ~5G limit in your actual command? If all those cores try to start blast jobs in parallel then no wonder you are running out of RAM. You may want to try a more conservative limit, say 20 cores.
I originally set my maxmem to 250Gb and set at 60CPU but PBS killed the job, hence why I went to the lower limit of 5G . Apologies for being ignorant on the matter.
So that I understand it right. Say we use 250GB; that 250GB is ultimately divided up across total cores, e.g. 250/X cores? What makes you think I could get away with say 20 cores but 250GB memory? Again apologies for the dumbness.
Do you have any additional details about why PBS killed the job when you had allocated 250G to it?
I have not used this program but by saying -MAXMEM 5000 you are restricting it to run in that memory (no matter how much RAM there is on your machine). At least that would be the logical interpretation. Is there any guidance in the help for this program as to how much you should allocate in terms of RAM and how many cores you should use?
Example used 50GB on 32 cores, but then again they could have more resources on their clusters. I have re-run it with 250GB over 15 cores. Fingers crossed this will work, cause I've ran out of ideas. Will repost as soon as I know.
Genomax, 15 cores @ 200GB seemed to do the trick. Thanks for the advice. Perhaps this will help someone else on down the line. the pipeline doesn't seem to like too many CPU.
I originally set my maxmem to 250Gb and set at 60CPU but PBS killed the job, hence why I went to the lower limit of 5G . Apologies for being ignorant on the matter.
So that I understand it right. Say we use 250GB; that 250GB is ultimately divided up across total cores, e.g. 250/X cores? What makes you think I could get away with say 20 cores but 250GB memory? Again apologies for the dumbness.
Do you have any additional details about why PBS killed the job when you had allocated 250G to it?
I have not used this program but by saying -MAXMEM 5000 you are restricting it to run in that memory (no matter how much RAM there is on your machine). At least that would be the logical interpretation. Is there any guidance in the help for this program as to how much you should allocate in terms of RAM and how many cores you should use?
It said that the memory had exceeded the allocation at 250GB.
Not much guidance, the web page is lacking, but the program is widely used for combining assemblies - http://arthropods.eugenes.org/EvidentialGene/trassembly.html . It needs a bit of a revamp IMO.
Example used 50GB on 32 cores, but then again they could have more resources on their clusters. I have re-run it with 250GB over 15 cores. Fingers crossed this will work, cause I've ran out of ideas. Will repost as soon as I know.
Genomax, 15 cores @ 200GB seemed to do the trick. Thanks for the advice. Perhaps this will help someone else on down the line. the pipeline doesn't seem to like too many CPU.
Glad to hear that did the trick. In some instances, more is not necessarily better :-)