As I understand, without explicitly indicating -p
/--threads
option, salmon
tries to take all available hardware threads. Is there a way to find how many threads were actually available and used by salmon?
Also, what happens when salmon
is run on a cluster and is assigned to multiple different host nodes? For example, I allocated 16 cores and added -p 16
to the parameters; running was extremely slow (more than 40 min for just one sample, compared to 4 min when I run from the login node without indicating -p
). Our admin explained it may be because allocated cores were on different nodes (ex., 12 cores at one node and 4 cores at another one). But what really happens in this case for salmon
? Does it run like a single thread, or what?
Overall, is it correct to say that salmon
is multi-threaded but not MPI-capable?
I think you are overcomplicating things for such a simple job as RNA-seq quantification. The tool is blazingly fast. Simply use job scheduler or something like GNU parallel that launches jobs for all available fastq files and set
-p
explicitely to a reasonable number like 8. Speed gain beyond that is probably small due to I/O limitations. If you have a standard node with like 72 cores you can easily run like 8 jobs in parallel, depending on available memory. Even large datasets will be quantified in a few hours. No need for MPI here.thanks for the input Alexander! But honestly I can't agree that it's something not serious to discuss - as I said, in my example quantification of just 1 sample took 40 minutes, and I have a hundred of samples. Simply running it with
-p 8
would not be enough because depending on the nodes availability, it could still happen that those 8 cores would fall on separate nodes. I had to explicitly request to limit cores allocation to a single node, and this helped - but I wanted to understand what's going on there, that's why I posted this question.I did not say it was not serious. I said you imho are overcomplicating things by requesting cores from different nodes. Personally I like to keep things as simple as possible. Book a single node and use all available cores, split over multiple jobs by GNU parallel or using job arrays, that is very simple yet effective. The slowdown you experience is probably (or most likely as you already said) because salmon is MPI optimized.
well initially I was not requesting different nodes, it's the scheduler which assigns my task that way by default: if I want to prevent this behavior and keep my task on a single node, I have to indicate it explicitly by adding a separate parameter (in my case, it's a separate line span[hosts=1] in a task file).