Question

R Problem - Save variables

0

Entering edit mode

6.4 years ago

pablo ▴ 350

Hi guys,

I explain my problem. I want to compute the scalability of an algorithm on my data set (thousands of rows). For this, I want to subset this data set and increase the size of the subsets of 500rows (so, 1st subset 500 rows, 2nd subset 1000rows, 3rd subset 1500rows...) .

I will use slurm and the SLURM_ARRAY_TASK_ID function to do this. This is my R code :

# load packages 

library(SpiecEasi)
library(optparse)

args <- commandArgs(trailingOnly = F)

# get options
option_list = list(
  make_option(c("-s", "--subset"), type="character", default=NULL,
              help="Input file matrix ")
);


opt_parser = OptionParser(usage = "Usage: %prog -f [FILE]",option_list=option_list,
                          description= "Description:")
opt = parse_args(opt_parser)


# main code 
print('Load matrice')

data<-read.table("/home/vipailler/PROJET_M2/raw/truelength2.prok2.uniref2.rares.tsv", h=T, row.names=1, sep="\t")

print('Subset matrice')

data=data[1:opt$subset,]
#print(data)

print('Translate')
data=t(data)
#print(data)

se_gl <- spiec.easi(data, method='glasso', lambda.min.ratio=1e-2, nlambda=20)

size=format(object.size(se_gl), units="Gb")
print(size)

######!!!!######

save(se_gl, file="/home/vipailler/PROJET_M2/data/se_gl.RData")

My problem is this one : if I use 5 arrays, to compute the scalability of the spiec.easi algorithm (so, from 500 to 2500 rows) , I would like it creates 5 different se_gl variables . I mean, my last command line will only save the last variable (2500rows) and will overwrite the 4 others.

So, how can I create 5 different variables from the same se_gl variable? I know that with slurm , this code will be executed 5 times for example (if I set up 5 arrays) , but the problem is my last command line...

Some help?

Bests

R • 936 views

ADD COMMENT • link 6.4 years ago by pablo ▴ 350

1

Entering edit mode

This is not really a bioinformatics question and may be better directed at StackOverflow. Anyways, one option would be to pass the file name as an argument to the script so that each run gets a different one, for example based on job array index.

ADD REPLY • link 6.4 years ago by Jean-Karim Heriche 27k