Any alternatives to BBMap's clumpify.sh program to optimize gzip compression?
1
0
Entering edit mode
2.9 years ago
O.rka ▴ 740

I've had some difficulties implementing this in pipelines because it randomly fails sometimes.

Are there any other programs that can be used in its stead?

fastq genomics rnaseq • 1.3k views
ADD COMMENT
0
Entering edit mode

Only alternative that I know of is picard makduplicates. That will need aligned data for one and then will have its own set of needs (read groups for one).

ADD REPLY
1
Entering edit mode
2.9 years ago
GenoMax 148k

because it randomly fails sometimes.

clumpify is not the program that is doing compression. BBTools programs make use of pigz library for parallel compression operations when it is available. If you do not want to use it simply add pigz=f to your commands to use system gzip program.

That said, perhaps your server is either using an older version of pigz (which you may want to get updated) or try installing it if it is not present.

There is a section of options for compression that you can play with. Check the in-line help.

If you are using multiple threads then there needs to be a sufficiently performant storage system for the read/writes to keep up. If you don't have access to one then reducing the number of threads would be another suggestion.

ADD COMMENT
0
Entering edit mode

My understanding was that clumpify reorders the sequence file to optimize compression.

ADD REPLY
0
Entering edit mode
reorder=f           Reorder clumps for additional compression.

Reordering is off by default so you must be actually turning it on in your jobs. Reordering is facilitating compression but is not actually doing the process of compression. Thread title and text made it sound like your jobs were being affected by the actual compression process.

On large datasets clumpify can take hundreds of GB of RAM (since it needs to keep a large amount of sequence data in RAM) so I would check to see if the failures you are seeing are because of the job running out of RAM.

ADD REPLY
0
Entering edit mode

Oh ok, it's likely a ram issue. Does this scale with the number of threads used?

ADD REPLY
0
Entering edit mode

To some extent but it is going to depend on the data more than likely.

ADD REPLY

Login before adding your answer.

Traffic: 1831 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6