Hi!
In the following article The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. I found this table:
Range Offset Type Range
Sanger standard 33–126 33 PHRED 0 to 93**
Solexa/early Illumina 59–126 64 Solexa −5 to 62**
Illumina 1.3+ 64–126 64 PHRED 0 to 62**
From the article (2010) it seemed that the consensus might be to use Illumina 1.3+ coding in the future. Then I came across this:
"Note that the latest Illumina CASAVA 1.8 pipeline (released June 2011), outputs in fastq-sanger rather than Illumina 1.3+. Thus Illumina 1.3+ and other Illumina scoring metrics are unlikely to be encountered if you are using Illumina sequencing data generated after July 2011" from this source
So, how is it? Can I rely on FASTQC to correctly display bases quality? Based on what I read, I would say no, fastq format can use characters that do not allow to recognize between different quality formats. But there is no possibility to switch between different formats in this program, is it?
And second, I have old 454 data, how can I determine their encoding? I have tried to google what is common standard for 454 with not much success.
Hope that you guys here are much more experienced. How do you deal with different fastq formats? Please, share your experience. Thanks a lot!
Just to add to the fastq bashing (not really helpful, I know): fastq is not a format in the strict sense, because it is lacking a proper definition allowing for deterministic parsing of the 'format'. I agree with Ido about the punishment.
Agreed. fastq is horribad.