Hi everyone, I have 417 samples from 4 groups, each sample contains the expression of 500 genes, (My data is a 500x417 matrix) and I want to do Differential Expression Analysis on it.
When I run DESeq in normal mode (parallel=FALSE), it takes ~137 seconds to finish;
& When I run DESeq in parallel mode (parallel=TRUE), and I register(SnowParam()) with 28 workers using BiocParallel, it takes ~406 seconds to finish;
& When I run DESeq in parallel mode (parallel=TRUE), and I register(MulticoreParam()) with 28 workers using BiocParallel, it takes ~405 seconds to finish.
Why DESeq is slower in parallel mode?
Not sure, Is it ok?
So the overhead of simply calling 28 workers keeps you away from achieving a speedup of 28, instead you get a speedup of 8 for the toy example of sleeping for five seconds. This might be ameliorated as the task time increases, but with real data you also have to split up the data and send to each worker. I'd try DESeq2 with smaller number of workers, and maybe if you are working with a cluster you can make sure that cores are on the same node. The details of the backend make a difference.
Thanks for you help.