I am using a SLURM HPC to run jobs and have ran into issues with storage. I have 3TB storage, and want to run over 1000 publicly available RNAseq data through my pipeline, which includes aligning with STAR. Obviously I must download the data in sections and run the pipeline multiple times.
Does anybody know any clever tricks to streamline this process?
Is there any way to configure a snakemake/slurm pipeline to lets say, run for 30 files, with all files expect counts being temp files, then once completed, run again for the next 30 files in a download list, and so on?
Any advice or guidance would be greatly appreciated !
Do you need the alignments or just counts?
I only need the count files.. I have maybe 6 rules, a few of which have STAR alignment steps.
Then look at recount3: http://rna.recount.bio/
Maybe you don't have to align alything yourself. And even if then use a fast lightweight aligner such as
salmon
which takes a fraction of time and memory compared to STAR. But really, use recount.Salmon isn't appropriate for the analysis I am trying to do unfortunately. Neither are already processed counts.. Very interesting resource though, thanks for the tip !