Question

batchtools_slurm job setting with seurat

0

Entering edit mode

9 months ago

marco.barr ▴ 170

Hi everyone, I am performing an analysis with Seurat on an object that is 2.4 GB in size. I am running my analysis on a Slurm cluster with physical limits of 64 GB of RAM and 10 cores. The HPC cluster administrator advises against using multiple node tasks but suggests utilizing only one node. Based on this, how can I make the most of these resources to avoid memory issues or interruptions? I am considering using future.batchtools with these options:

My sbatch script:

#!/bin/bash
#SBATCH --job-name=seurat_analysis
#SBATCH --output=seurat_analysis_%j.log
#SBATCH --error=seurat_analysis_%j.err
#SBATCH --time=168:00:00  
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=10
#SBATCH --mem=64G

Rscript /home/scrna/script.R

My R script:

 library(future)
 library(batchtools)
 library(Seurat)

 plan(list(tweak(batchtools_slurm, resources = list(
  job.name = "seurat_analysis",
  log.file = "seurat_analysis.log",
  walltime = 168 * 60 * 60,
  memory = 64 * 1024,       
  ncpus = 10
   )),
 multiprocess
 ))

  gc() 

  Exp <- CreateSeuratObject() 
   # ......... Etc

Can you recommend anything? Thank you very much for your support.

seurat HPC Slurm • 583 views

ADD COMMENT • link updated 9 months ago by DBScan ▴ 480 • written 9 months ago by marco.barr ▴ 170

score 1 · Answer 1 · 2024-07-22

future.batchtools only make sense when you want to submit SLURM directly from R. From your script it looks like you run R from a SLURM job, so I would just set the plan and that's it. So you could just do

plan("multiprocess", workers = Sys.getenv("SLURM_CPUS_PER_TASK"))

Seurat has a small section on parallelization here: https://satijalab.org/seurat/reference/prepsctfindmarkers#parallelization-with-future-1