CellRanger in cluster mode with slurm template
1
1
Entering edit mode
4.8 years ago
sidwell ▴ 10

If someone has already manage to run cellRanger with Slurm, maybe you can help me :

Until now, I was running CellRanger on a cluster, on a single node of 512Go RAM.
For a dataset of 6,7k cells and 90k reads/cell, cellranger count function takes 7h30.
I can dispose of 5 nodes of 512Go RAM.

I tried the slurm template proposed by 10x (even if it's not officially support), jobs are submited by Martian

#!/usr/bin/env bash
#SBATCH -J __MRO_JOB_NAME__
#SBATCH -p big
#SBATCH --export=ALL
#SBATCH --nodes=1 --ntasks-per-node=__MRO_THREADS__
#SBATCH --signal=2
#SBATCH --no-requeue
### Alternatively: --ntasks=1 --cpus-per-task=__MRO_THREADS__
###   Consult with your cluster administrators to find the combination that
###   works best for single-node, multi-threaded applications on your system.
#SBATCH --mem=__MRO_MEM_GB__G
#SBATCH -o __MRO_STDOUT__
#SBATCH -e __MRO_STDERR__

__MRO_CMD__

When I run the same count function on the same dataset but with slurm template as following :

cellranger count --transcriptome=refdata-cellranger-mm10-3.0.0 --fastqs=./indepth_C07_MissingLibrary_1_HL5G3BBXX, ./indepth_C07_MissingLibrary_1_HNNWNBBXX --jobmode=./martian-cs/v3.2.3/jobmanagers/slurm.template`

I check the jobs submited : Martian submit 64 jobs by 64 jobs to the cluster (to slurm) and one job by node is running (this is how the cluster work, I can submit only one job by node because it uses all CPUs of the node = 16 CPUs). So instead of having 1 node busy, I parralelize on 5 nodes. BUT :
it takes 12h42 instead of 7h30.

I checked the processus running and number of CPUs used : When I use a single node, the 1st process called read_chunks use 1-4CPUs, the 2nd process python (don't know what it is doing ?) use 16CPUs so all CPUs.
With parallelization on 5 nodes : read_chunks takes 1-4CPUs and python ONLY ONE CPU on each node instead of 16 CPUs. I guess that's why it takes so long !!

Is it because Slurm is not officially support ?
Do you think I can modify something in the template to change that ?

RNA-Seq • 5.6k views
ADD COMMENT
3
Entering edit mode
4.8 years ago
GenoMax 148k

cellranger on SLURM does not use all cores all the time. Your cluster does need to allow SLURM jobs to submit sub-jobs for this to work correctly. It sounds like you have the process working but it is taking longer?

We generally create a cellranger script (for mkfastq or count) and add these parameters --jobmode=slurm --maxjobs=24 --mempercore=16000 (adjust as needed for your cluster) to normal cellranger command line. Actual job submission is a simple sbatch -p partition -t time_needed --wrap"cellranger_script.sh".

ADD COMMENT
0
Entering edit mode

Thanks for you answer.

Your cluster does need to allow SLURM jobs to submit sub-jobs

Do you mean allow multiple jobs by node ? Or allow jobs to be submitted from a compute node ?

I will try your way

ADD REPLY
1
Entering edit mode

Allow jobs to be submitted by a job that is already running on a compute node. Main job that you submit takes care of running sub-jobs as needed.

ADD REPLY
0
Entering edit mode

After verification, my cluster is allow to submit sub-jobs. I tried with your way (write the cellranger command in a bash script and wrap it), the only difference I see is that the mrp process (which is the process that submit all jobs) run on a compute node instead of running on head node.
But the computing time is still the same : It is taking really longer when I use 5 nodes instead of one.
Did you already tried to run cellranger on a single node ? Is it slower for you ?

ADD REPLY
0
Entering edit mode

Since you have a large dataset there may be not much to do speed this process up. You could try assigning more cores which may help with alignment steps (STAR) but otherwise the steps that run on single cores will always be the bottle-neck.

ADD REPLY
0
Entering edit mode

Ok but still, it doesn't explain why it is faster in job submission mode on 1 compute node and slower in cluster mode on 5 compute nodes. With same parameters. It should be faster in cluster mode on 5 nodes. It is what they are saying in documentation :

Cell Ranger can be run in cluster mode [...] This allows highly parallelizable stages to utilize hundreds or thousands of cores concurrently, dramatically reducing time to solution.

For me it is dramatically increasing time to solution to use cluster mode.
Steps that run on single cores in cluster mode should also run on single cores when I submit on a single node, it's not the case.
As a result, I will just submit on only one node instead of using cluster mode, it's too bad.

ADD REPLY
0
Entering edit mode

We are not using #SBATCH --nodes=1 --ntasks-per-node=__MRO_THREADS__. That may be limiting python to one core per node. We control the total number of jobs started by cellranger by using --maxjobs=24. If I wanted to have more cores then I would increase this number. For us it has not been a big issue since largest NovaSeq runs we have done have taken less than 24h to complete.

If you are willing then I would take out that directive above and then increase the --maxjobs=24 to a higher number and see if that helps. You may also want to take this out #SBATCH --no-requeue unless needed for your cluster.

ADD REPLY
0
Entering edit mode

I tried without #SBATCH --nodes=1 --ntasks-per-node=__MRO_THREADS__.
I am using the example data for cellranger :

This guide outlines how to perform the analysis, and what results 10x assays and software produce using data from a recent Nature publication “Single-cell transcriptomes of the regenerating intestine reveal a revival stem cell” (2019; doi: 10.1038/s41586-019-1154-y).

it is not a large dataset, it takes less than 24h to complete : 7h with job submission mode on 1 compute node and 12h with cluster mode on 5 compute nodes. I would like to have 7h or less but in cluster mode.

I tried with --maxjobs=24 also with --maxjobs=100 it doesn't change anything, I have always only 5 jobs running (one by node) and the rest in waiting queue. I can't submit multiple jobs on a node, because it is suppose to use all CPUs.

ADD REPLY
0
Entering edit mode

Is that data from 10x's site? I may try it out if I find the time. Every cluster is setup differently and it is possible that something on your cluster is causing this. Have you tried to work with your cluster admins to see if they can help?

ADD REPLY
0
Entering edit mode

Thanks for you time, it would be great if you could try it out but I understand if you don't find the time. I will try to check my slurm configuration in details.

I found all links on this page of 10x : https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/tutorials/gex-analysis-nature-publication
The data I am using : https://sra-pub-src-1.s3.amazonaws.com/SRR7611048/C07.bam.1
I had to do bamtofastq C07.bam.1 irradiated
the reference : http://cf.10xgenomics.com/supp/cell-exp/refdata-cellranger-mm10-3.0.0.tar.gz
and then cellranger count --id=irradiated --transcriptome=/path/to/refdata-cellranger-mm10-3.0.0 --fastqs=./indepth_C07_MissingLibrary_1_HL5G3BBXX,indepth_C07_MissingLibrary_1_HNNWNBBXX

We don't really have cluster admins, we administrate ourselves the cluster (we are 2 on the platform) so we have a standard configuration of slurm and so far, no particular issues about it.

ADD REPLY
0
Entering edit mode

It took about 5.5 h for the jobs to finish. I did not see a huge difference by throwing more cores in the pool. Jobs using 44 cores instead of 24 led to jobs finishing in more or less the same time. Disclaimer: our cluster stays busy so it is possible some sub-jobs many have pended for sometime.

ADD REPLY

Login before adding your answer.

Traffic: 2746 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6