Entering edit mode
19 months ago
rls_08
▴
40
I am looking for suggestions for a tool that will change the quality score of a fastq file into a binary pass/fail score, similar to what SRA-lite is doing. I deally I want to go from file1.fastq.gz to file2.fastq.gz with a reduction in file size.
Does it need to be fastq? CRAM is able to compress unaligned data quite well, usually a reduction to 66% of original fastq file size can be achieved. The
samtools import
command can convert fastq to unaligned CRAM. Keep in mind that manipulation of fastq file quality (strictly speaking) is already some sort of analysis as it alters the original data. Do you really need this, rather than just using CRAM, or gzip compression at maximum level (orpigz
with-11
). There was recently this interesting discussion at Bioinfo StackExchange that you can read to get other inspiration => https://bioinformatics.stackexchange.com/questions/20858/good-recommended-way-to-archive-fastq-and-bam-files