Our HPC cluster is currently using SGE, but we'll be moving to SLURM eventually. I've been looking into ways to abstract out the job scheduler interface from our pipelines, and I have seen this mentioned
https://en.wikipedia.org/wiki/DRMAA
There is a Python wrapper here: https://github.com/pygridtools/drmaa-python
http://drmaa-python.readthedocs.io/en/latest/tutorials.html#starting-and-stopping-a-session
Does anyone have more experience using this? Is this best used as a drop-in replacement for hard-coded qsub
commands for job submission and monitoring? And are there public workflows anywhere that use this for bioinformatics (e.g. ChIP-Seq, RNA-Seq, etc.)? Haven't been able to find much on Google besides mentions of it in here
I actually did write my own wrapper library for SGE here, but the task is non-trivial, and we'll soon be moving to SLURM as well, which would need yet another wrapper, plus another module to abstract out the SGE/SLURM specific parts so I dont have to write separate pipelines per-cluster... essentially DRMAA all over again. I was looking at Nextflow recently for other reasons and was really impressed with its cluster integration. However, I also have more general tools that also need to interface with the cluster, and I'm not sure that Nextflow would be the most appropriate for them. For example, trying to build a web-app that runs a program that must be run on the HPC cluster.
as a follow up to this, I ended up going with Nextflow and its been great. It solves this entire problem very well, since it uses the native scheduler directives in a dynamically generated bash submission script. This makes it very easy to customize it to match the intended usage of the HPC by the admins, and to get their help when HPC issues arise since they can see the exact commands being issued. Nextflow also makes it very easy to have configurations for multiple systems with different schedulers, without turning your pipeline into a mess.