BCL compression
2
1
Entering edit mode
2.1 years ago
garcesj ▴ 50

Hi,

I'm trying to store some RUNs generated by different Illumina sequencers. My idea is to compress them to keep all files together (and reduce size, but this is secondary)... however, because the big amount of files and because the global size can be enormous (~120GB!) I'm having some problems in the compression.

find . -type f -not -path "./*fastq*/*" > mytar.list
tar -czvf - -T mytar.list | pigz -0 -p 32 > ../210413_VH00461_11_AAACYVNHV.tar.gz

Anyone has experience on this? Any alternative?

Just for curiosity, I guess all files within the BLC folder are essential... but could delete some of them? What're absolutely necessary for demultiplexing?

Thanks in advance.

compression demultiplexing illumina BCL • 1.4k views
ADD COMMENT
0
Entering edit mode

I just noticed that you are using

pigz -0

that means no compression - but then you also have tar czvf which also compresses the files.

So you see the conundrum here. You are probably starting with already compressed files (BCL files) then you compress those with tar z.

Then you pipe the output into another compressor, pigz -0 p 32, that has no effect other than adding another level of a gzip on top of a gzip of an already compressed file. So the data is compressed three times at this point. If anything, this process will make your archive larger than the original files.

ADD REPLY
0
Entering edit mode
2.1 years ago

Usually the BCL files are deleted after the conversion to FASTQ - once it has been established that the conversion has been successful.

In the past, people occasionally reran a conversion with different parameters to get better results, but nowadays, you'd only rerun the conversion if the conversion has been done incorrectly. For example, the sample sheet is incorrect.

Thus, in general one would not keep the BCL files around for long term.

ADD COMMENT
0
Entering edit mode
2.1 years ago
GenoMax 147k

If you belong to a core facility then you may be required to store a copy of the run depending on what kind of service agreement you provide to your customers to keep the data available. So keeping a tar archive of data folder (or at least fastq files) would be the way to go. BCL files are already compressed with newer/large sequencers and will not be amenable to further compression, if any.

What're absolutely necessary for demultiplexing?

Entire raw data folder will be required, if you wish to demux again using Illumina's bcl-convert or bcl2fastq.

ADD COMMENT

Login before adding your answer.

Traffic: 1847 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6