Hi,
The fastq format right now has: header, sequence string, "phantom" header and quality string. For storage purposes, why doesn't the fastq format incorporate nucleotide information in the quality strings? Is it just to make it more human readable?
Can you give us an example of how this would be implemented?
Also, FASTQ is traditionally a merge of FASTA and QUAL files, and the format you see is owing to the dated implementation. I'm sure people are working on optimizations, but there's also the "why fix what isn't broken" question to be satisfactorily answered.
FASTQ is definitely not meant to be human readable.
Are you proposing to have five sets of quality scores--one each for A, C, G, T, and N? Then each set of quality scores would have a unique set of characters within denoting a certain Phred value?