Hi y'all, newb question here. WIthin bioinformatics there is decent selection of tools to enhance reproducibility such as workflow managers like snakemake, package managers like conda, containerization software like docker, VM's and cloud service, etc. I've been trying to find research papers within bioinformatics that show an improvement in reproducibility (e.g., among 3 independent runs, between different OS / clusters, before and after implementing these software to a pipeline, etc.) when using one / a combination of these software. Does anyone know of a paper as described? This may seem like an easy task, but maybe I'm searching with the wrong keywords? Eventually I want to adapt my pipeline to using these software, but I want to see the efficacy of said software before finalizing my 'meta-pipeline.'
There was a paper regarding that but I can not remember it now.
but maybe take a look at this:
Ten Simple Rules for Reproducible Computational Research
Reproducible bioinformatics project: a community for reproducible bioinformatics analysis pipelines
Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics
Reproducible Research In High-Throughput Biology: Case Studies In Forensic Bioinformatics