Distribute proportionality calculations on many nodes (SLURM)
1
0
Entering edit mode
5.6 years ago
pablo ▴ 310

Hello,

I explain what I did (with R) :

1 - I have a matrix of abundances of OTUs (more than 817.000 columns) . I need to compute the proportionality between these OTUs. For the moment, I can split a matrix in submatrices in order to compute the proportionality between each of these submatrices , and then, get the final matrix.

data=matrix(runif(10000), ncol=1000)   #random matrix
data=clr(data)     
ncol<-ncol(data)

rest<-ncol%%100 #100 columns by submatrix
blocks<-ncol%/%100 #10 submatrices

ngroup <- rep(1:blocks, each = 100)
#if (rest>0) ngroup<-c(ngroup,rep(blocks+1,rest))
split <- split(1:ncol, ngroup)

#I get all the combinations between my submatrices

combs <- expand.grid(1:length(split), 1:length(split)) 
combs <- t(apply(combs, 1, sort))
combs <- unique(combs) 
combs <- combs[combs[,1] != combs[,2],] 

res <- foreach(i = seq_len(nrow(combs))) %dopar% { 
  G1 <- split[[combs[i,1]]]
  G2 <- split[[combs[i,2]]]
  dat.i <- cbind(data[,G1], data[,G2])
  rho <- cor_rho(dat.i)  #cor_rho is my  function to compute the proportionality
}

**And then, I get the final matrix :**

resMAT <- matrix(0, ncol(data), ncol(data))

for(i in 1:nrow(combs)){ 
  batch1 <- split[[combs[i,1]]]
  batch2 <- split[[combs[i,2]]]
  patch.i <- c(batch1, batch2) 
  resMAT[patch.i, patch.i] <- res[[i]] 
}

2 - I work with Slurm on a cluster with several nodes. I know that with a node (256G and 32CPUs, in one day, I can compute the proportionality between 60.000 columns). So, I need to use 817.000/60.000 ~ 14 submatrices which gives me (14*13)/2 = 91 combinations (=91 nodes).

3 - I don't know how I could use SLURM to create a SLURM code in order to distribute each combination calculation on a node.

Any advice ?

Bests, Vincent

correlation matrix slurm parallelization • 2.2k views
ADD COMMENT
0
Entering edit mode

Is there a reason you haven't asked your cluster admin for the appropriate syntax for your cluster?

ADD REPLY
0
Entering edit mode

Actually, he gave me some pieces of advice but I'm still struggling with the SLURM syntax..

ADD REPLY
0
Entering edit mode

Couldn't you just submit the job 14 times (with different input files being your 14 different submatrices)? This would require 14 different SLURM submission scripts, which would probably take less time to write than a Python or Perl script to automatically generate the 14 scripts.

ADD REPLY
0
Entering edit mode

I would like to stay with my R code, and incorporate it in a SLURM code, if it possible?

ADD REPLY
0
Entering edit mode

resMAT[patch.i, patch.i] <- res[[i]] - you're only going to update entries on the diagonal of resMAT

ADD REPLY
0
Entering edit mode

But it means my final matrix is false?

ADD REPLY
0
Entering edit mode

ARG sorry, my mistake

ADD REPLY
1
Entering edit mode
5.6 years ago

This is not really a bioinformatics question but a programming one so might be better addressed on StackOverflow.

I am probably missing something but you may not need to split your matrix into blocks, you could use something like this in R:

library(foreach)
library(doSNOW)

   cluster <- makeCluster(number_of_nodes, type = "SOCK")
   registerDoSNOW(cluster)
  # This is a typical way to parallelize computation of a distance/similarity matrix
   result_matrix <- foreach(j = c(1:n), .combine='cbind') %:%
                          foreach(i = c(1:n), .combine = 'c') %dopar% {
                            cor_rho(data[i], data[j])
                          }
   stopCluster(cluster)

The bash script submitted to SLURM would look like this:

#!/bin/bash

#SBATCH --mail-type=FAIL,END
#SBATCH -N number_of_cpus
#SBATCH -n number_of_nodes
#SBATCH --mem=memory_required

Rscript my_script
ADD COMMENT
0
Entering edit mode

Not to repeat calculations, maybe forloops should be:

foreach(j = 1:(n -1), .combine='cbind') %:%
  foreach(i = (j + 1):n), .combine = 'c') %dopar% {...
ADD REPLY
0
Entering edit mode

Thanks for you answer. I'll try your solution. When you put 1:n, does n means ncol(data) ?

However, if I execute my R script in a SLURM code, does SLURM will "know" how to distribute each combination on different nodes? (if I precise :

#SBATCH --mail-type=FAIL,END
#SBATCH -N 32
#SBATCH -n 91
#SBATCH --mem=250G )
ADD REPLY
0
Entering edit mode

Is the default value for -c (--cpus-per-task) set to 1 cpu (-n) on your cluster? If that is true then you are asking for -n 91 cores/tasks from a node with 32 cores.

I generally use -n with -N to specify number of task/cores per node. If you only have 32 cores per node then you may need to specify -n 32 along with -N 91 (nodes), if you want them all to run at the same time? Not sure if you can divide your R-jobs to submit SLURM sub-jobs. Using job arrays may be an option then.

ADD REPLY
0
Entering edit mode

Yes, it sets up to 1 cpu by default.

Actually, I would like to distribute each combination to one node, like this :

submatrix 1 vs submatrix 2 -> node 1
 submatrix 2 vs submatrix 3 -> node 2 
   ...
 submatrix 13 vs submatrix 14 -> node 91

I don't know if it possible?

ADD REPLY
0
Entering edit mode

It should be possible as long as you submit individual jobs. See changes I made to my comment above.

ADD REPLY
0
Entering edit mode

Actually, I don't mind to run them at the same time. It could be necessary to wait that some jobs end before some others begin?

ADD REPLY
0
Entering edit mode

It could be necessary to wait that some jobs end before some others begin?

If that is the case, then you would need to look into --dependency=type:jobid option for sbatch to make sure those tasks don't start until the job (jobid above) they are dependent on finishes successfully.

ADD REPLY
0
Entering edit mode

Thanks. But is it normal when I execute for example (if I have only 10 combinations -> 10 nodes) sbatch --partition normal --mem-per-cpu=4G --cpus-per-task=16 --nodes=10 my_R_script.sh , it only creates one job and not 10 as I expected?

ADD REPLY
0
Entering edit mode

Yes it is normal because as far as SLURM is concerned only job it has been asked to run is my_R_script.sh. Unless this script creates sub-jobs from within that script with multiple sbatch --blah do_this commands SLURM can't run them as separate jobs.

Note: Some clusters may be configured not to allow an existing job to spawn sub-jobs. In that case your only option would be one below.

Other way would be to start the 10 computations independently (if that is possible) by doing:

sbatch --blah my_R_script1.sh
sbatch --blah my_R_script2.sh
..
sbatch --blah my_R_script10.sh
ADD REPLY
0
Entering edit mode

Actually, I just use my_R_script.sh to execute my R script, that I put in the top the topic . So, it doesn't create any sub-jobs..

If I would like to create the 91 sub-jobs I need, do I let my R script as it is and I create a slurm script to create these sub-jobs?

ADD REPLY

Login before adding your answer.

Traffic: 1869 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6