I am doing some quality filtering of a large bunch of mixed fastq files produced from multiple versions of Illumina platforms. Thus, the quality scores are sanger_fastq
format for some (quality ASCII offset 33) and for others its lluminav1.3+_fastq (quality ASCII offset 64) and so on.
Case 1: If you use sanger_quality
format files without parameter -Q33
you get an error message fastq_quality_filter: Invalid quality score value...
Case 2: but if you wrongly use -Q33
for reads with illumina_quality
format, you get error messages like
segmentation fault (core dumped)
or
$ fastq_quality_filter -i file.fastq -o OUT -v -q 20 -p 50 -Q33
fastq_quality_filter: bug: got empty array at fastq_quality_filter.c:97
Is there any special trick exists in fastx-tookit that automatically detect the quality format (ASCII offset 33 or 64) and does the quality filtering afterwards accordingly without separating the mixed fastq files?
EDIT: http://en.wikipedia.org/wiki/FASTQ_format​ (Some reading)
S - Sanger Phred+33, raw reads typically (0, 40)
X - Solexa Solexa+64, raw reads typically (-5, 40)
I - Illumina 1.3+ Phred+64, raw reads typically (0, 40)
J - Illumina 1.5+ Phred+64, raw reads typically (3, 40)
with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator (bold)
(Note: See discussion above).
L - Illumina 1.8+ Phred+33, raw reads typically (0, 41)
reformat.sh
in BBTools will autodetect and convert qualities:If you do that, the files that were already ASCII-33 will be unchanged. It can also do quality-filtering and trimming (with the trimq and maq flags), and unlike fastx-toolkit can handle paired reads. Overall I'd recommend abandoning fastx-toolkit.
Note, by the way, that it is not possible to autodetect quality encoding with 100% confidence because ASCII-33 and ASCII-64/ASCII-66 can have values in the same range.
Wow.. every time I look for a solution for a problem and end up with very efficient new tools and packages that I didn't know before. BBTools is a nice package and has many features. Thanks Brian.
Here are some links to discussions about standalone scripts: Tool To Find Out If Fastq Is In Sanger Or Phred64 Encoding?, http://seqanswers.com/forums/showthread.php?t=16562
Ideally you could capture the output of a script and pass it to fastx-trimmer, or follow Brian's suggestion.