I have multiple RNAseq libraries to parse, so I wrote a pipeline using bash and submitted them in batch. Most of libraries ran well and gave me the results I wanted. But two of libraries failed at quality filter part (fastx_clipper), reporting like "fastx_clipper: Invalid quality score value (char '#' ord 35 quality value -29) on line 4
". Another one reported like "Invalid quality score value (char ',' ord 44 quality value -20) on line 4
".
In fact, I have not indicated quality score value. Here is the part of code.
tar xjf StHe51G3_reads.tar.bz2;
PairFiles=(1 2);
TrimmedFile=();
cd StHe51G3_reads
for PairIndex in ${PairFiles[@]}
do
RawFile='StHe51G3_read'$PairIndex'.fastq';
TrimmedFile='StHe51G3_read'$PairIndex'_trimmed.fastq';
fastx_clipper -a 'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT' -n -v -i $RawFile |
fastx_clipper -a 'CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT' -n -v -i - |
fastx_clipper -a 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT' -n -v -i - |
fastx_clipper -a 'AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG' -n -v -i - |
fastx_clipper -a 'TTTTTTTTTTAATGATACGGCGACCACCGAGATCTACAC' -n -v -i - |
fastx_clipper -a 'TTTTTTTTTTCAAGCAGAAGACGGCATACGA' -n -v -i - |
fastq_quality_trimmer -t 20 -l 25 -v -i - -o $TrimmedFile
TrimmedFile+=($TrimmedFile);
rm $RawFile;
done
I have checked the fastq file for quality encoding formats to see which score system it used. It should be "Illumina 1.3+ Phred+64" since majority contains the quality coding like "^_`abcdefg" , there is no way that the coding can be "Sanger Phred+33". According to the https://en.wikipedia.org/wiki/FASTQ_format#Encoding. No system can cover "Illumina 1.3+ Phred+64" and "Sanger Phred+33" at same time. So how does "#" and "," (Sanger Phred+33) come from, since all the libraries are sequenced by the same platform ?
Anyone help ??????
add -Q33 option
duplicate of FASTQ quality check