How to call a R variable in a loop with Slurm?
0
0
Entering edit mode
5.5 years ago
pablo ▴ 310

Hello,

I have a R (RHO_COR.R) script and I would like to create a loop in order to split the jobs on several nodes.

I show the part of the script where I would like to create the loop.

res <- foreach(i = seq_len(nrow(combs)) %dopar% {
 G1 <- split[[combs[i,1]]]
 G2 <- split[[combs[i,2]]]
 dat.i <- cbind(data[,G1], data[,G2])
 rho.i <- cor_rho(dat.i)
}

The different results of res (which correspond to submatrices of correlation between OTUs) are stored in several files. combs is a vector which looks like this (but it can change, according to my input file) :

> combs
      [,1] [,2]
 [1,]    1    2
 [2,]    1    3
 [3,]    1    4
 [4,]    1    5
 [5,]    2    3
 [6,]    2    4
 [7,]    2    5
 [8,]    3    4
 [9,]    3    5
[10,]    4    5

I would like to send each row of combs seq_len(nrow(combs) on a node.

This is my slurm code :

#!/bin/bash
#SBATCH -o job-%A_task.out
#SBATCH --job-name=paral_cor
#SBATCH --partition=normal
#SBATCH --time=1-00:00:00
#SBATCH --mem=126G  
#SBATCH --cpus-per-task=32

#Set up whatever package we need to run with

module load gcc/8.1.0 openblas/0.3.3 R

# SET UP DIRECTORIES

OUTPUT="$HOME"/$(date +"%Y%m%d")_parallel_nodes_test
mkdir -p "$OUTPUT"

export FILENAME=~/RHO_COR.R

#Run the program

Rscript $FILENAME > "$OUTPUT"

I do not want to use arrays. I wonder if I create an argument which is seq_len(nrow(combs) could be a solution ?

for i in my_argument
 do Rscript $FILENAME -i > "$OUTPUT"
done

Thanks

(I asked on stackoverflow but I didn't get any answer back yet..)

r slurm matrix • 2.9k views
ADD COMMENT
0
Entering edit mode

You'll need an srun in there and $i rather than -i.

ADD REPLY
0
Entering edit mode

Thanks for your reply. But can I combine srun and Rscript into the same loop? And other point, I don't know how to "call" my R variable as an argument into this loop.

ADD REPLY
0
Entering edit mode

Edit : I saved my variable into a file that I read in bash.

And I use :

res <- foreach(i = opt$subset) %dopar% {
 G1 <- split[[combs[i,1]]]
 G2 <- split[[combs[i,2]]]
 dat.i <- cbind(data[,G1], data[,G2])
 rho.i <- cor_rho(dat.i)
}

Slurm part

var=$(cat ~/my_file.tsv | wc -l)
subset=$(seq $var)

I still struggle to find a way to execute the jobs on several nodes. The loop is executed on only one node and I don't find an issue with srun...

ADD REPLY
0
Entering edit mode

If you're going to use %dopar% then run it in parallel directly in R and don't bother submitting multiple jobs. You'll have to figure out how to do that on your local cluster of course. Otherwise just use an array job or create a loop in your sbatch script calling srun for each value of i.

ADD REPLY
0
Entering edit mode

That's what I would like to do : calling srun for each value of i .

I tried :

for i in $subset
do
srun Rscript my_script.R --subset $i 
done

But it is still executed on one node..

ADD REPLY

Login before adding your answer.

Traffic: 1779 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6