Question

Adapter Trimming Length Cutoff And Quality Trimming Quality Cutoff

3

Entering edit mode

10.6 years ago

epigene ▴ 590

I want to do some quality control on raw fastq files by doing adapter trimming and quality trimming. I'm wondering what is a good read length cutoff to use to keep a trimmed reads? i.e. if after trimming, the read is discarded if its length is smaller than a cutoff.

Similar thing, what quality score cutoff should I use to trim off bad quality reads? Is 20 a good start? And is it a good idea to do quality trimming? Many aligners have quality considered so I'm not sure which option is better.

Thanks!

ngs • 6.6k views

ADD COMMENT • link updated 5.1 years ago by milad eidi ▴ 20 • written 10.6 years ago by epigene ▴ 590

score 2 · Answer 1 · 2014-03-28

2

Entering edit mode

10.6 years ago

Charles Warden 8.3k

You can also use FastQC to try and access the quality of your reads (for example, to see where the quality scores start to significantly drop off and/or at what position that drop occurs):

http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

You can then carry out the length, quality, etc. trims with the fastx-toolkit:

http://hannonlab.cshl.edu/fastx_toolkit/

ADD COMMENT • link 10.6 years ago by Charles Warden 8.3k

1

Entering edit mode

I know these tools. do you have a recommendation on the minimum length of the trimmed reads to keep?

ADD REPLY • link 10.6 years ago by epigene ▴ 590

0

Entering edit mode

If you are asking about minimum final read length, you can only allow unique alignments if you want to remove ambiguous reads. Alternatively, I think the first short read sequences were ~35 bp, so I probably wouldn't go below that.

If you are asking about how much to trim off the read, I think it will depend upon your own samples.

I don't typically trim any reads when working with reference-based alignments.

For de novo assembly, it can sometimes help to trim based upon quality scores (such as a sequence of 20 or 30 nucleotides with >Q20), trim out adapter sequences, and trim out mono-nucleotide reads. However, I don't believe I have actually simply trimmed based upon length. Perhaps an extra 2-3 nt could help detect small adapter sequences, but I think there are cases where even the steps that I listed are not necessary to get good contigs (meaning that lack of any trimming might have been OK).

ADD REPLY • link 10.6 years ago by Charles Warden 8.3k

1

Entering edit mode

yeah, i was thinking about minimum final read length. I guess only keep unique alignments is the better option here. I think the decision to trim or not comes down to the size of the fragments. it's more important to trim for small size libraries.

ADD REPLY • link 10.6 years ago by epigene ▴ 590