Question

Fastx Invalid Quality Score Value

5

Entering edit mode

11.0 years ago

sckinta ▴ 730

I have multiple RNAseq libraries to parse, so I wrote a pipeline using bash and submitted them in batch. Most of libraries ran well and gave me the results I wanted. But two of libraries failed at quality filter part (fastx_clipper), reporting like "fastx_clipper: Invalid quality score value (char '#' ord 35 quality value -29) on line 4". Another one reported like "Invalid quality score value (char ',' ord 44 quality value -20) on line 4".

In fact, I have not indicated quality score value. Here is the part of code.

        tar xjf StHe51G3_reads.tar.bz2;
        PairFiles=(1 2);
        TrimmedFile=();
        cd StHe51G3_reads
        for PairIndex in ${PairFiles[@]}
        do
                RawFile='StHe51G3_read'$PairIndex'.fastq';
                TrimmedFile='StHe51G3_read'$PairIndex'_trimmed.fastq';
                fastx_clipper -a 'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT' -n -v -i $RawFile | 
                fastx_clipper -a 'CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT' -n -v -i - | 
                fastx_clipper -a 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT' -n -v -i - | 
                fastx_clipper -a 'AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG' -n -v -i - | 
                fastx_clipper -a 'TTTTTTTTTTAATGATACGGCGACCACCGAGATCTACAC' -n -v -i - | 
                fastx_clipper -a 'TTTTTTTTTTCAAGCAGAAGACGGCATACGA' -n -v -i - |
                fastq_quality_trimmer -t 20 -l 25 -v -i - -o $TrimmedFile
                TrimmedFile+=($TrimmedFile);
                rm $RawFile;
        done

I have checked the fastq file for quality encoding formats to see which score system it used. It should be "Illumina 1.3+ Phred+64" since majority contains the quality coding like "^_`abcdefg" , there is no way that the coding can be "Sanger Phred+33". According to the https://en.wikipedia.org/wiki/FASTQ_format#Encoding. No system can cover "Illumina 1.3+ Phred+64" and "Sanger Phred+33" at same time. So how does "#" and "," (Sanger Phred+33) come from, since all the libraries are sequenced by the same platform ?

Anyone help ??????

fastx bash • 17k views

ADD COMMENT • link updated 9.6 years ago by Biostar 20 • written 11.0 years ago by sckinta ▴ 730

2

Entering edit mode

add -Q33 option

ADD REPLY • link 11.0 years ago by Rm 8.3k

0

Entering edit mode

duplicate of FASTQ quality check

ADD REPLY • link 11.0 years ago by Pierre Lindenbaum 164k

score 11 · Answer 1 · 2013-11-15

11

Entering edit mode

11.0 years ago

Istvan Albert 101k

Adding a comment as an answer, use the

-Q33

option