Question

Forum:Has anyone tried the new LFQC compression scheme for FASTQ?

2

Entering edit mode

9.4 years ago

Dan D 7.4k

Has anyone used the tool described in this paper for compressing FASTQ data? I'm going to evaluate it when I get time, but I wanted to see if anyone has intel on it ahead of that. I'll report back here after giving it my assessment.

fastq compression • 3.9k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.4 years ago by Dan D 7.4k

4

Entering edit mode

This paper has been temporarily withdrawn by the authors.

ADD REPLY • link 9.4 years ago by lh3 33k

3

Entering edit mode

Just for reference, the journal website says: "This manuscript has been temporarily withdrawn at the request of the authors. The authors report that they have identified an error in the software. This withdrawal is to provide the authors with an opportunity to determine to what extent the reported results are affected by this error."

ADD REPLY • link 9.4 years ago by lh3 33k

1

Entering edit mode

It would have been nice of the Journal to update the HTML article with the same notice.

ADD REPLY • link 9.4 years ago by Matt Shirley 10k

1

Entering edit mode

Frankly this sounds like a serious problem beyond a bug of the method. Some methodological error on the evaluation: for example the files were actually larger and slower than before and they switched up the comparison. That can happen easily.

See edit: Then the fact that in 2015 the Bioinformatics journal publishes a software that is "available" from someone's webpage is saddening.

--- Edit ---

Actually scratch that (kind of) there is a github repo here:

https://github.com/mariusmni/lfqc

still at the time of publication there was no repository.

ADD REPLY • link 9.4 years ago by Istvan Albert 102k

0

Entering edit mode

Very useful info, thanks!

ADD REPLY • link 9.4 years ago by Dan D 7.4k

1

Entering edit mode

If you haven't seen it (I hadn't until a few days ago, coincidentally), there was a compression challenge recently that evaluated a few related tools: http://www.pistoiaalliance.org/projects/sequence-squeeze/

An article describing the results is here, in case you wanted to investigate alternatives to lfqc.

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 9.4 years ago by matted 7.8k

0

Entering edit mode

Anyone familiar enough with Ruby out there who can comment on the code? It looks like it might be a wrapper around Mahoney's lpaq and zpaq tools, but I'm not certain. I can't get behind a paywall to read a preprint so maybe it is described in the paper?

ADD REPLY • link 9.4 years ago by Alex Reynolds 36k

1

Entering edit mode

I am not familiar with Ruby, but the code is easy to understand. I may be wrong, but I think the core algorithm is on line 15:

def initialize(filePath, storeQualNoEOL=false, nameCompMethod = $cm['zpaq'], dataCompMethod = $cm['lpaq'], qualCompMethod = $cm['zpaq'])

and line 99:

cmd = "tar cf #{archive} #{zName} #{zData} #{zQual}"

edit: the github code has an Apache license, but it is surprising the manuscript states "the implementations are freely available for non-commercial purposes", which seems to me incompatible with both zpaq and lpaq licenses.

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 9.4 years ago by h.mon 35k

0

Entering edit mode

The tar step is just packaging, not compression. It looks like it is a wrapper to Matt Mahoney's compression tools, applied on different pieces of a FASTQ record.

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 9.4 years ago by Alex Reynolds 36k

1

Entering edit mode

Yes, the tar only bundles together files, but it is important because it keeps things tidy. :-)

I glanced over the paper, for sequence and quality, it is practically just a wrapper for lpaq8 and zpaq, respectively - there is some processing (encoding # runs as a bit flag and removing newlines), but nothing original.

The header line is "tokenized" (split into bits), tokens are compressed (RunLength encoding or Incremental encoding, or just reverse it, as they "observed that this tends to improve the compression ratio of the context mixing algorithm applied downstream". Then it is again compressed with zpaq.

ADD REPLY • link 9.4 years ago by h.mon 35k

2

Entering edit mode

Actually that was my impression too. I did not spend substantial time on details but honestly it just seemed like running some two existing methods even invoking them as command line applications. My first thought was, how is this a bioinformatics paper?

ADD REPLY • link 9.4 years ago by Istvan Albert 102k

1

Entering edit mode

how is this a bioinformatics paper

Reviewer don't actually evaluate the algorithms, implementation, or code health and instead just glance over figures.

ADD REPLY • link 9.4 years ago by Matt Shirley 10k

Ram · Answer 1 · 2015-12-10

2

Entering edit mode

9.4 years ago

Matt Shirley 10k

Looks like the authors continued development and released another version (?) of their code/paper with a duplicate PMID. See the new primary author respond on PubMed Commons: http://1.usa.gov/1Qys6Nh.

ADD COMMENT • link updated 5.4 years ago by Ram 45k • written 9.4 years ago by Matt Shirley 10k