Hi guys,
I explain my problem. I want to compute the scalability of an algorithm on my data set (thousands of rows). For this, I want to subset this data set and increase the size of the subsets of 500rows (so, 1st subset 500 rows, 2nd subset 1000rows, 3rd subset 1500rows...) .
I will use slurm and the SLURM_ARRAY_TASK_ID function to do this. This is my R code :
# load packages
library(SpiecEasi)
library(optparse)
args <- commandArgs(trailingOnly = F)
# get options
option_list = list(
make_option(c("-s", "--subset"), type="character", default=NULL,
help="Input file matrix ")
);
opt_parser = OptionParser(usage = "Usage: %prog -f [FILE]",option_list=option_list,
description= "Description:")
opt = parse_args(opt_parser)
# main code
print('Load matrice')
data<-read.table("/home/vipailler/PROJET_M2/raw/truelength2.prok2.uniref2.rares.tsv", h=T, row.names=1, sep="\t")
print('Subset matrice')
data=data[1:opt$subset,]
#print(data)
print('Translate')
data=t(data)
#print(data)
se_gl <- spiec.easi(data, method='glasso', lambda.min.ratio=1e-2, nlambda=20)
size=format(object.size(se_gl), units="Gb")
print(size)
######!!!!######
save(se_gl, file="/home/vipailler/PROJET_M2/data/se_gl.RData")
My problem is this one : if I use 5 arrays, to compute the scalability of the spiec.easi algorithm (so, from 500 to 2500 rows) , I would like it creates 5 different se_gl variables . I mean, my last command line will only save the last variable (2500rows) and will overwrite the 4 others.
So, how can I create 5 different variables from the same se_gl variable? I know that with slurm , this code will be executed 5 times for example (if I set up 5 arrays) , but the problem is my last command line...
Some help?
Bests
This is not really a bioinformatics question and may be better directed at StackOverflow. Anyways, one option would be to pass the file name as an argument to the script so that each run gets a different one, for example based on job array index.