A short question on local blast algorithm. In its multi-thread mode, is it better/faster to run the program on a cluster of multiple nodes (lets say 4 nodes, each having a 16 threaded processor) or to have a single multiprocessor motherboard (also 4 processors of the same type, for example). I understand from different discussions that programs like GNU/parallel fully parallelizes the processes in the later case (single multi-processor plate) also in parts when blast does not run in multi-threaded mode and can thus speed up the process substantially.
All your opinions are very welcome.
Agreed. but change that 8 down to one or two. The blast system doesn't really parallelize because it's bottlenecked at the memory bus and I/O. You can fire up multiple sub-jobs manually and see how much speedup your system can do.
Thank you. So, a multiprocessor motherboard, with each processor having its own RAM does not necessarily our perform a small cluster of few nodes for BLAST. Theoretically, I saw a mother board with 4 processors (each having let's 8-16GB RAM) something like a "cluster" of 4 nodes (each having 1 CPU and 8-16RAM) to which the jobs could be submitted in parallel (using the right software). All the communication could be done faster since it was on one board. Was I completely off?
You're right that communication could be faster, but these blast threads are not communicating. This setup becomes problem specific. If the subtasks needed to communicate, then you could get a bonus by being local. These blast threads will be separate entirely and instead interfere with each other's memory access requests. Each node will support only a few jobs. I've seen best results using 2 cores on 20 nodes, vs poor results using 2 nodes of 20 cores each. Might also be related to reading and writing large files.