Fastx Invalid Quality Score Value
1
5
Entering edit mode
11.0 years ago
sckinta ▴ 730

I have multiple RNAseq libraries to parse, so I wrote a pipeline using bash and submitted them in batch. Most of libraries ran well and gave me the results I wanted. But two of libraries failed at quality filter part (fastx_clipper), reporting like "fastx_clipper: Invalid quality score value (char '#' ord 35 quality value -29) on line 4". Another one reported like "Invalid quality score value (char ',' ord 44 quality value -20) on line 4".

In fact, I have not indicated quality score value. Here is the part of code.

        tar xjf StHe51G3_reads.tar.bz2;
        PairFiles=(1 2);
        TrimmedFile=();
        cd StHe51G3_reads
        for PairIndex in ${PairFiles[@]}
        do
                RawFile='StHe51G3_read'$PairIndex'.fastq';
                TrimmedFile='StHe51G3_read'$PairIndex'_trimmed.fastq';
                fastx_clipper -a 'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT' -n -v -i $RawFile | 
                fastx_clipper -a 'CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT' -n -v -i - | 
                fastx_clipper -a 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT' -n -v -i - | 
                fastx_clipper -a 'AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG' -n -v -i - | 
                fastx_clipper -a 'TTTTTTTTTTAATGATACGGCGACCACCGAGATCTACAC' -n -v -i - | 
                fastx_clipper -a 'TTTTTTTTTTCAAGCAGAAGACGGCATACGA' -n -v -i - |
                fastq_quality_trimmer -t 20 -l 25 -v -i - -o $TrimmedFile
                TrimmedFile+=($TrimmedFile);
                rm $RawFile;
        done

I have checked the fastq file for quality encoding formats to see which score system it used. It should be "Illumina 1.3+ Phred+64" since majority contains the quality coding like "^_`abcdefg" , there is no way that the coding can be "Sanger Phred+33". According to the https://en.wikipedia.org/wiki/FASTQ_format#Encoding. No system can cover "Illumina 1.3+ Phred+64" and "Sanger Phred+33" at same time. So how does "#" and "," (Sanger Phred+33) come from, since all the libraries are sequenced by the same platform ?

Anyone help ??????

fastx bash • 17k views
ADD COMMENT
2
Entering edit mode

add -Q33 option

ADD REPLY
0
Entering edit mode

duplicate of FASTQ quality check

ADD REPLY
11
Entering edit mode
11.0 years ago

Adding a comment as an answer, use the

-Q33

option

see also: FASTQ quality check

ADD COMMENT
0
Entering edit mode

It works. Thank you:)

ADD REPLY

Login before adding your answer.

Traffic: 1553 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6