Entering edit mode
2.8 years ago
kstangline
▴
80
The following is in reference to this post : Genozip: A new compression tool for FASTQ, BAM, VCF and more
I'm curious if anyone knows of open source alternatives to the [paid] Genozip compression tool referenced above.
I've found an open source tool called DSRC (~37% additional compression over gzip), but it's nowhere near as good as Genozip.
DSRC:
https://github.com/refresh-bio/DSRC
It appears that Genozip is the best compression tool currently on the market.
So I assume that does not apply in your case?
Looking at using a compression tool for my company's pipeline, so I wouldn't be able to use the free version.
If you represent a company and the company values your time/storage cost then the company should pay Genozip license fee. This is simply cost of doing business for your company and it supports the developer in long term.
not to mention that the genozip license costs about 2K per year - I think as far as licenses go that sounds affordable.
It is difficult to price out just how much say cloud computing cost and transfer per TB are, I would expect somewhere around 100/200 year, so over 10TB it would pay for itself.
the licensing fees for genozip are all set up wrong IMHO
they feel expensive for someone storing little data and hence being a little price sensitive,
yet the same costs a negligibly small for a company that stores lots of data (and would benefit far more)
cram can compress bam by ~2 fold. How many fold compression are you looking at? You can also try : https://github.com/Genetalks/gtz/. There are two serious issues: Every sixmonths or so, license expires. Installation changes user bash environment silently.
CRAM format
ORA : https://support.illumina.com/sequencing/sequencing_software/DRAGENDecompression.html
ORA compression license is only included with newer sequencer hardware (computer attached) and in BaseSpace (on site and in cloud) so it can't be used after the fact or on all sequence data. Only the decompressor is free.