I am in the typical situation that I need a resequencing pipeline (i.e., FastQC, read preprocessing, FastQC again, alignment with BWA, variant calling). I need to fulfill both the requirements of having a stable pipeline with stable tools for the standard stuff (e.g., both "single-donor WES variant calling", "trio WES variant calling", but also "tumor/normal WES variant calling with somatic filtration") but I sometimes need more specialized functionality or more extensive downstream analysis.
I want to use Docker for isolating my tools against the uncertain, changing, and sadly oftentimes unversioned world of Bioinformatics software (I'm looking at you, vt and vcflib, but I'm still very grateful that you are around). What would be your recommendation for a best practice here:
- one Docker image for everything, adding tools as I go
- one Docker image for each pipeline step (e.g. combining BWA-MEM, samtools, samblaster for the alignment so I can use piping in a front-end script)
- one Docker image for the standard stuff, then maybe some images for each additional step.
Does anyone know of a person/organization that has published their Dockerized pipeline stuff in a Blog post or elsewhere that goes beyond toy examples or "here is a Dockerfile for the tool that I wrote/published"?
Cheers!
From looking at NGSeasy Readme and source files, it's not quite clear to me yet how this pipeline can be run in a distributed computing environment (e.g. a Slurm cluster). Any comments on this?