Hi folks.
I'm constructing a simple container that will accept a pair of fastq files and run the following tools:
- cutadapt
- minimap2
- Strelka2
- SNPEff
- BASH / SnpSift to count up some of the SNPEff results.
The container is specifically required to operate on a single pair of fastqs at a time. In other words, if someone wants to run more than one sample they can run two instances of the container.
My initial plan was just to provide the BASH commands for executing the series of tools into the container recipe so that when someone runs the container they are just running the BASH commands that I wrote inside the container environment.
What I'm wrapping my head around now is if it is advisable/recommeded/preferable to to have the container use something like snakemake or nextflow to manage the simple linear workflow I have listed above. There is always some chance that more analyses will be added to the container.
Are there any thoughts about if my container should employ a workflow manager on the inside, or is it better to keep things simple with plain BASH commands?
So far all I can really say is that managing the stderr/stdout messages from each of the tools is making my list of BASH commands look ugly with lots of capture and redirection. Can't say that is much of a problem though. Am I missing something?
thanks Richard
IMO yes it is, 100%, no question, definitely. The time/effort to learn one of those will be equal to writing your own implementation, only yours will not be as reproducible, and many of the issues you have (or haven't yet encountered/thought about) have been well solved already.