Question

Is There An Implementation Of Dnabit Compress Algorithm?

1

Entering edit mode

13.3 years ago

Martyix ▴ 120

I'm looking at:

And the sample implementation of the algorithm seems to be broken (I see missing closing brackets for starters).

Does anyone has a working implementation? Thank you!

dna • 3.2k views

ADD COMMENT • link updated 11.6 years ago by Biostar 20 • written 13.3 years ago by Martyix ▴ 120

3

Entering edit mode

well - frankly - when the sample implementation does not work I usually move on

ADD REPLY • link 13.3 years ago by Istvan Albert 102k

2

Entering edit mode

Have you tried contacting the authors?

ADD REPLY • link 13.3 years ago by Sean Davis 27k

0

Entering edit mode

Well, it seems to be the only possibility left. So I'll try that.

ADD REPLY • link 13.3 years ago by Martyix ▴ 120

score 2 · Answer 1 · 2012-01-05

2

Entering edit mode

13.3 years ago

Marvin ▴ 900

That algorithm is nonsense. First off, they claim to compress arbitrary DNA into no more than 1.58 bits per base, which is of course impossible. And if you read their paper, you'll find that different DNA sequences give rise to the same code. You're much better off just using gzip.

ADD COMMENT • link 13.3 years ago by Marvin ▴ 900

4

Entering edit mode

you need two bits to represent a,c,g,t. however, runs of the same 2-bit representation of the same nucleotide can be further compressed with block compression or BW.

ADD REPLY • link 13.3 years ago by Aaronquinlan 12k

2

Entering edit mode

Right, but they're claiming 1.58 bits in the "worst-case" scenario, which is rubbish. Perhaps that was its performance on the worst one they happened to test, but there have to be input cases that have a minimum of 2 bits per base.

ADD REPLY • link 13.3 years ago by Fwip ▴ 500