I want to do some quality control on raw fastq files by doing adapter trimming and quality trimming. I'm wondering what is a good read length cutoff to use to keep a trimmed reads? i.e. if after trimming, the read is discarded if its length is smaller than a cutoff.
Similar thing, what quality score cutoff should I use to trim off bad quality reads? Is 20 a good start? And is it a good idea to do quality trimming? Many aligners have quality considered so I'm not sure which option is better.
Thanks!
I know these tools. do you have a recommendation on the minimum length of the trimmed reads to keep?
If you are asking about minimum final read length, you can only allow unique alignments if you want to remove ambiguous reads. Alternatively, I think the first short read sequences were ~35 bp, so I probably wouldn't go below that.
If you are asking about how much to trim off the read, I think it will depend upon your own samples.
I don't typically trim any reads when working with reference-based alignments.
For de novo assembly, it can sometimes help to trim based upon quality scores (such as a sequence of 20 or 30 nucleotides with >Q20), trim out adapter sequences, and trim out mono-nucleotide reads. However, I don't believe I have actually simply trimmed based upon length. Perhaps an extra 2-3 nt could help detect small adapter sequences, but I think there are cases where even the steps that I listed are not necessary to get good contigs (meaning that lack of any trimming might have been OK).
yeah, i was thinking about minimum final read length. I guess only keep unique alignments is the better option here. I think the decision to trim or not comes down to the size of the fragments. it's more important to trim for small size libraries.