Question

Using DRMAA in bioinformatics workflows?

4

Entering edit mode

7.0 years ago

steve ★ 3.5k

Our HPC cluster is currently using SGE, but we'll be moving to SLURM eventually. I've been looking into ways to abstract out the job scheduler interface from our pipelines, and I have seen this mentioned

https://en.wikipedia.org/wiki/DRMAA

There is a Python wrapper here: https://github.com/pygridtools/drmaa-python

http://drmaa-python.readthedocs.io/en/latest/tutorials.html#starting-and-stopping-a-session

Does anyone have more experience using this? Is this best used as a drop-in replacement for hard-coded qsub commands for job submission and monitoring? And are there public workflows anywhere that use this for bioinformatics (e.g. ChIP-Seq, RNA-Seq, etc.)? Haven't been able to find much on Google besides mentions of it in here

software • 2.6k views

ADD COMMENT • link updated 18 months ago by Ram 44k • written 7.0 years ago by steve ★ 3.5k

score 2 · Answer 1 · 2017-11-22

2

Entering edit mode

7.0 years ago

Sean Davis 27k

I'd suggest adopting a workflow management system such as snakemake or nextflow rather than relying on drmaa. The command-line submission process on most clusters receives much more testing and use than drmaa, and so were more stable. I also remember some version mismatches when migrating versions of schedulers. An alternative is to write your own lightweight wrapper library to abstract out the command-line submission stuff.

Snakemake, for example, does have some support for drmaa, but after trying it on our HPC system (running PBS and then slurm), we went back to using submission commands. As a side note, our HPC group was not entirely supportive of us using drmaa, as it did an end-run around custom wrappers that the HPC group used for command-line submission tools.

ADD COMMENT • link 7.0 years ago by Sean Davis 27k

0

Entering edit mode

I actually did write my own wrapper library for SGE here, but the task is non-trivial, and we'll soon be moving to SLURM as well, which would need yet another wrapper, plus another module to abstract out the SGE/SLURM specific parts so I dont have to write separate pipelines per-cluster... essentially DRMAA all over again. I was looking at Nextflow recently for other reasons and was really impressed with its cluster integration. However, I also have more general tools that also need to interface with the cluster, and I'm not sure that Nextflow would be the most appropriate for them. For example, trying to build a web-app that runs a program that must be run on the HPC cluster.

ADD REPLY • link 6.9 years ago by steve ★ 3.5k

0

Entering edit mode

as a follow up to this, I ended up going with Nextflow and its been great. It solves this entire problem very well, since it uses the native scheduler directives in a dynamically generated bash submission script. This makes it very easy to customize it to match the intended usage of the HPC by the admins, and to get their help when HPC issues arise since they can see the exact commands being issued. Nextflow also makes it very easy to have configurations for multiple systems with different schedulers, without turning your pipeline into a mess.

ADD REPLY • link 5.4 years ago by steve ★ 3.5k

score 0 · Answer 2 · 2017-11-22

Hi,

Since i'm doing bioinformatic on clusters, I never needed to use DRMAA. I always used the scheduler commands in my workflow for managing jobs. The only time I had to use DRMAA (slurm-drmaa), was when moved our cluster to SLURM to made the galaxy server able to submit job.

I would be also interest to have some feedback on this technnolgy.