I would expect from this command to trim low quality ends and give me sequences in which all bases have a quality of at least 5. Nevertheless, when I check the output, I can find lots of low quality bases in the output file and even bases corresponding to these long runs of Bs are not removed!
Am I missing something? Is the command right for what I want to do?
Thanks Jeremy! I replaced the old equation with the (simple) one you said and it works fine! Other quality-dependent parameters were also affected by this ('min_qual_mean' for example which now also gives me the expected output)! Thanks again for your time!
Is this Illumina 1.3 (phred+64) data? From your mention of B runs I suspect it is:
-si13
Quality data in FASTQ file is in Solexa/Illumina 1.3+ format and should be scaled to Phred quality scores ranging from 0 to 40. (Not required for Solexa/Illumina 1.5+, Sanger, Roche/454, Ion Torrent, PacBio data.)
I have tried the '-si13' option but sequences with B runs are still there. One explanation would be that since B qualities have this special meaning, they might not be transformed to a number (as happens with all other quality values)...
I haven't used this program, but intuitively, I suspect the [?]trim_qual_left[?] and [?]trim_qual_right[?] work by clipping bases from each side until a base with quality 5 or higher is encountered. So you would end up with reads that might still contain low quality bases, but the two bases remaining at the edges of your reads would have a quality of 5 or higher?
Are you trying to remove all reads where any base has a quality of <5, even if the base is in the middle of the read and is flanked by high quality bases? This seems like an unusual thing to want to do. Normally you would want to limit yourself to trimming the ends since that's where you would expect the low quality bases to be (e.g. in an Illumina experiment, you might see the 5' ends of your reads have high quality, and the quality degrades as you move closer to the 3' end). Read aligners tend to be tolerant of one or two low quality bases in the middle of an otherwise good sequence (configured by parameters).
If you want to remove any read with a base with a quality < 5, you might want to try experimenting with the other trimming parameters. E.g. I might start with setting the [?]trim_qual_window[?] to the length of your reads. From what I understand after a quick look at the manual, this might implement the trimming rule by considering the whole read at once instead of a sliding window of 1 base.
I just want to trim the end of reads. The problem, though, is that "trim_qual_left" and "trim_qual_right" don't work as I expected; after running the above-mentioned command I still get reads with B runs in their 3' end, for example.
Perhaps the program can implement only one rule at a time? Maybe try running with "trim_qual_left", then run the output through the program again, the second time with "trim_qual_right"?
Thanks Jeremy! I replaced the old equation with the (simple) one you said and it works fine! Other quality-dependent parameters were also affected by this ('min_qual_mean' for example which now also gives me the expected output)! Thanks again for your time!
no problem seeing how this bug propogated was actually pretty interesting
no problem seeing how this bug propagated was actually pretty interesting