I'm in a project were many samples are being analyzed for single cell gene expression and TCR (via cellranger multi
scRNAseq and vdj-t).
This has lead to a situation were more than a third of my disk is filled by that project's data (it is not that big 1TB). Currently with 37 samples but we expected 60 when all the data is collected. While I have another disk available this one is usually half full with the raw data of different projects and the genome and indexes of several projects. I don't have access to a server with more memory availability.
I don't think I need to keep every intermediate file, as I record the genome version, and the cellranger version to ease reproducibility. There are some files that I can safely remove without compromising other (downstream) analysis: Each sample I analyze has a copy of the vdj-t library, or the .bam files for example (I don't check or use them).
I found that using cellranger aggr
leads to a smaller folder/data size, but 1) the project is still adding more samples 2) it cannot append a new sample to an existing aggregation, so the usefulness of the tool is very limited.
Do you have recommendations for further reducing disk space while working on this?
If you are ultimately planning to use
cellranger aggr
then not much you can do but keep things around until you do a finalaggr
run.I'm not really planning to use it, but it seems the best way to reduce the memory requirements on the computer. I have deleted some files such as summaries, analysis and the ones I mention (and more).