Entering edit mode
9.0 years ago
intipedroso
▴
10
Hi,
I am using LEON V 1.0.0. I have compressed a fastq file and when I decompress it the output is truncated.
$/home/shared/app/leon-1.0.0-Source/leon -file foo.fastq.gz.leon -d
Start decompression
Input filename: foo.fastq.gz.leon
Qual filename: ./foo.fastq.qual
Output format: Fastq
Kmer size: 31
Input File was compressed with leon version 1.0.0
Block count: 105
[Decompressing all streams] 99 % elapsed: 0 min 7 sec estimated remaining: 0 min 0 sec
Output filename: ./foo.fastq.d
Time: 11.90s
Speed: 0.00 mo/s
leon
If I look at the size of the files
$ ls -lh foo.fastq.*
-rw-rw-r-- 1 ipedroso ipedroso 14K dic 1 14:43 foo.fastq.d
-rw-rw-r-- 1 ipedroso ipedroso 69M dic 1 14:43 foo.fastq.gz.leon
-rw-rw-r-- 1 ipedroso ipedroso 138M dic 1 14:43 foo.fastq.gz.qual
the original compress file was about 400M.
No idea why LEON generates a file with a ".d" extension as well.
Help much appreciated.
Hi, could you also post the command used to compress the file?
I got all the exact same issues just now on https://www.encodeproject.org/files/ENCFF001LCY/@@download/ENCFF001LCY.fastq.gz
Also the final file size of the compressed file is much smaller than the file size stated in the program's final compression report. I think it forgot to write some data to disk. Also the reported compression ratios are not particularly good - although I appreciate the point of this publication/tool was to show that you can compress DNA in graphs. It's a shame that it just turns out to be slow and bigger than just storing the DNA in a table.
Finally, the .d file for me is in FASTA format not FASTQ, which is probably a relevant issue if the author chooses to debug this.