Filtering sanger_fastq, illumina_fastq files using Fastx-Toolkit
2
1
Entering edit mode
9.7 years ago
bioinfo ▴ 840

I am doing some quality filtering of a large bunch of mixed fastq files produced from multiple versions of Illumina platforms. Thus, the quality scores are sanger_fastq format for some (quality ASCII offset 33) and for others its lluminav1.3+_fastq (quality ASCII offset 64) and so on.

Case 1: If you use sanger_quality format files without parameter -Q33 you get an error message fastq_quality_filter: Invalid quality score value...

Case 2: but if you wrongly use -Q33 for reads with illumina_quality format, you get error messages like segmentation fault (core dumped) or

$ fastq_quality_filter -i file.fastq -o OUT -v -q 20 -p 50 -Q33
fastq_quality_filter: bug: got empty array at fastq_quality_filter.c:97

Is there any special trick exists in fastx-tookit that automatically detect the quality format (ASCII offset 33 or 64) and does the quality filtering afterwards accordingly without separating the mixed fastq files?

EDIT: http://en.wikipedia.org/wiki/FASTQ_format​ (Some reading)

 S - Sanger        Phred+33,  raw reads typically (0, 40)
 X - Solexa        Solexa+64, raw reads typically (-5, 40)
 I - Illumina 1.3+ Phred+64,  raw reads typically (0, 40)
 J - Illumina 1.5+ Phred+64,  raw reads typically (3, 40)
     with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator (bold) 
     (Note: See discussion above).
 L - Illumina 1.8+ Phred+33,  raw reads typically (0, 41)
illumina fastx fastq filtering • 5.8k views
ADD COMMENT
0
Entering edit mode
9.7 years ago
SES 8.6k

Is there any special trick exists in fastx-tookit that automatically detect the quality format (ASCII offset 33 or 64) and does the quality filtering afterwards accordingly without separating the mixed fastq files?

I don't think this exists in fastx-toolkit, but fastqc will report the encoding of the data and you can run it from the command line for processing many files. That may be the easiest, though I know there are also some standalone scripts for detecting the encoding, so that would be another option for building a trimming pipeline.

ADD COMMENT
0
Entering edit mode
9.7 years ago
bioinfo ▴ 840

I am actually a fastqc fan but this time I am running an in-house software-pipeline where only certain tools such as Seqtk, Fastx-Toolkit and usearch etc. are preinstalled. So, the software will use its fastq_quality_filter (of fastx-toolkit) in the certain step before doing any downstream analysis. The only option I have is to fix the encoding and convert over all (over 500!!!) fastq files to "ONE" particular quality-score-encoded format before putting into the pipeline. As you mentioned, do you have any links for standalone scripts to do that?

I saw one option in the wiki page of fastq-format to convert illumina1.3 (phred64) to 1.8 (phred33). But to do that I have to detect first which files are in phred64 quality score-format and then separate them out and convert to phred33 to make all in phred33.!!!

ADD COMMENT
1
Entering edit mode

reformat.sh in BBTools will autodetect and convert qualities:

reformat.sh in=file.fq out=fixed.fq qout=33

If you do that, the files that were already ASCII-33 will be unchanged. It can also do quality-filtering and trimming (with the trimq and maq flags), and unlike fastx-toolkit can handle paired reads. Overall I'd recommend abandoning fastx-toolkit.

Note, by the way, that it is not possible to autodetect quality encoding with 100% confidence because ASCII-33 and ASCII-64/ASCII-66 can have values in the same range.

ADD REPLY
1
Entering edit mode

Wow.. every time I look for a solution for a problem and end up with very efficient new tools and packages that I didn't know before. BBTools is a nice package and has many features. Thanks Brian.

ADD REPLY
0
Entering edit mode

Here are some links to discussions about standalone scripts: Tool To Find Out If Fastq Is In Sanger Or Phred64 Encoding?, http://seqanswers.com/forums/showthread.php?t=16562

Ideally you could capture the output of a script and pass it to fastx-trimmer, or follow Brian's suggestion.

ADD REPLY

Login before adding your answer.

Traffic: 1752 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6