Data management of cellranger output
0
0
Entering edit mode
3 months ago
Lluís R. ★ 1.2k

I'm in a project were many samples are being analyzed for single cell gene expression and TCR (via cellranger multi scRNAseq and vdj-t). This has lead to a situation were more than a third of my disk is filled by that project's data (it is not that big 1TB). Currently with 37 samples but we expected 60 when all the data is collected. While I have another disk available this one is usually half full with the raw data of different projects and the genome and indexes of several projects. I don't have access to a server with more memory availability.

I don't think I need to keep every intermediate file, as I record the genome version, and the cellranger version to ease reproducibility. There are some files that I can safely remove without compromising other (downstream) analysis: Each sample I analyze has a copy of the vdj-t library, or the .bam files for example (I don't check or use them).

I found that using cellranger aggr leads to a smaller folder/data size, but 1) the project is still adding more samples 2) it cannot append a new sample to an existing aggregation, so the usefulness of the tool is very limited.

Do you have recommendations for further reducing disk space while working on this?

cellranger • 384 views
ADD COMMENT
0
Entering edit mode

If you are ultimately planning to use cellranger aggr then not much you can do but keep things around until you do a final aggr run.

ADD REPLY
0
Entering edit mode

I'm not really planning to use it, but it seems the best way to reduce the memory requirements on the computer. I have deleted some files such as summaries, analysis and the ones I mention (and more).

ADD REPLY

Login before adding your answer.

Traffic: 1370 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6