I've had some difficulties implementing this in pipelines because it randomly fails sometimes.
Are there any other programs that can be used in its stead?
I've had some difficulties implementing this in pipelines because it randomly fails sometimes.
Are there any other programs that can be used in its stead?
because it randomly fails sometimes.
clumpify
is not the program that is doing compression. BBTools programs make use of pigz
library for parallel compression operations when it is available. If you do not want to use it simply add pigz=f
to your commands to use system gzip program.
That said, perhaps your server is either using an older version of pigz
(which you may want to get updated) or try installing it if it is not present.
There is a section of options for compression that you can play with. Check the in-line help.
If you are using multiple threads then there needs to be a sufficiently performant storage system for the read/writes to keep up. If you don't have access to one then reducing the number of threads would be another suggestion.
reorder=f Reorder clumps for additional compression.
Reordering is off by default so you must be actually turning it on in your jobs. Reordering is facilitating compression but is not actually doing the process of compression. Thread title and text made it sound like your jobs were being affected by the actual compression process.
On large datasets clumpify
can take hundreds of GB of RAM (since it needs to keep a large amount of sequence data in RAM) so I would check to see if the failures you are seeing are because of the job running out of RAM.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Only alternative that I know of is
picard makduplicates
. That will need aligned data for one and then will have its own set of needs (read groups for one).