I have developed a Python based bioinformatics pipeline which utilises the BLASTx feature to BLAST a DNA sequence against a local copy of the nr database. Previously I was using Windows just fine to run the pipeline but have recently switched over to Ubuntu due to the linux requirements of some of the tools I am using.
However since making the switch to Ubuntu the pipeline has significantly slowed down at the BLASTx steps, and investigating further I have discovered that even with 2 BLASTx processes running in parallel, with 12 threads allocated for each instance, the maximum amount of RAM my system uses is 2-3GB out of the 128GB available on my workstation, shown below with top -H -p pid
and free -h
.
I am running both 64 bit Ubuntu 22.04 and 64bit BLAST v2.14.0. Anyone have any ideas or seen this before?
Cheers!
How did you come to that conclusion? Just in terms of time need to complete an identical job (with exactly the same data) e.g. used to run in X time on Windows now takes Y (Y > X)?
Are there hardware differences in the two machines in terms of CPU type/speeds/generations?
If your job is running to completion you can't do much else in terms of changing things. Since we are talking about two different OS's there could be many differences in terms of libraries being used/low level hardware code etc.
Hey thanks for the reply. It is the same machine, I am dual booting it currently so the specs are the same. In terms of how I came to this conclusion well it is more that jobs that would take 3-5 minutes to run take up to 15 minutes+ now. I will look to do a direct comparison measuring the time differences this afternoon and update you.
I am just baffled as to why the system never seems to go above 2-3gb of RAM used at any time, on windows this pipeline almost always had RAM usage sitting at 80-120GB so I am a little puzzled.
RAM usage may be just how the two OS handle the differences in memory management. Perhaps Linux is a bit more aggressive in garbage collection.
Since the hardware remains the same, it would indicate that the blast code on windows is more efficient (?). I guess it is possible if they are using low level hardware based (SIMD etc) enhancements there but not in linux.
I vaguely recall a blast bug with multithreading - some versions in the past.
So make sure it is the same blast version in both cases.
see also some posts on "better" multithreading
https://ncbiinsights.ncbi.nlm.nih.gov/2021/07/09/blast-2-12-0/