Hello Biostars,
I would like to ask about the latest "news" about parallelization in R.
I have searched a bit through various articles/posts on Biostars and StackOverflow but most of them are quite old.
Has someone tried to benchmark the latest parallelization packages in R checking on versatility, usability, and speed?
If someone wants to parallelize on a laptop is there a better package to use? What if he needs later to use it on a cluster?
Background
Most of the time I work on Bioconductor Docker containers, otherwise Windows laptop.
In my project, I work with Genomic Ranges (GR) so an approach "per chromosome" should be more suitable I think, although some times I need to slice the GR even more for the approach to work with laptop's RAM.
I use also packages such as bumphunter, derfinder etc.
Thank you for your time,
Konstantinos
Not a bioinformatics question unless you provide context also because what's best depends on the task and the hardware, i.e. not every task benefits equally from different parallelization approaches.
Some background what you want to parallelize and which operating system you are on?
@ATpoint Most of the time I work on Bioconductor Docker containers, otherwise Windows laptop. In my project, I work with Genomic ranges so an approach "per chromosome" should be more suitable I think. I use also packages such as bumphunter, derfinder etc.
@Jean-Karim Heriche Should I move it to StackOverflow?
Just edit your question to give bioinformatics context.
If RAM on a laptop is the concern, then the issue is not parallelization but splitting the GRanges it seems? How would you benefit from parallelization then?
In some cases that I have more than 10,000 GRs per Chr, I split them into chunks, and instead of using chromosomes, I use these chunks to perform the parallelized downstream functions.