The answer to this depends on what you mean by storeing/version controlling a pipeline.
If you mean the code of the pipelines - the script, or CWL definition file or make-file, then absolutely this should be version controlled using git, as mentioned by the others. As for where we store code - we tend to divide pipelines into production pipelines that we expect to use over multiple projects - we will have clones of these either in the groups shared storage or on people home directories. For project specific pipelines, the code with live in a src sub-directory of the project directory.
But another meaning of the question could be "Do you use version control for pipeline runs? Where do you store pipeline runs?" Where by "runs", I mean the collection of input files, configuration files, intermediate files and output files that arrise by running a pipline on some input files.
Such collections of files are often (very) large and binary, and unsuitable for traditional version control, which works best for shortish text files. However, the beauty of pipelines should be that data + code + configuration + cpu time = results.
Thus, for us, the ideal is (and I'm not saying we always manage to live up to this):
- For each pipeline run a git repo is initiated inthe pipeline run directory, or the pipeline run directory is added to a project repo.
- In the repo we put: the pipeline configuration file, the pipeline log file, an automated script that will generate/link/copy input data files for raw data stores (our own /Raw_data dir, or GEO etc).
- In reality, because of the limitaitons of our HPC system, we run pipelines on the lustre filesystem attached to the HPC - but this can only be used temporarily, so the config, log etc files are actaully created on the long term file store and then linked to the fast, short-term storage.
git git git
In your version control system you can use any workflow manager. See here for few examples ^^: https://github.com/pditommaso/awesome-pipeline
See an example of Nextflow pipelines hosted in github here: https://github.com/NBISweden/pipelines-nextflow
Then for better reproducibility your pipeline can use conda or even better: containers (docker/Singularity). The example above use both. You choose with a parameter if you want to run the pipeline using conda environement or docker/singularity containers.
Additionally, I would suggest using an isolated environment, like Conda, which track tools version you used.
Q1) conda is a good option - you can setup a conda environment for each pipeline (or shared environemnts, as applicable), and then 'activate' the appropriate conda for a given pipeline
Q2) git and/or confluence