Question

Why does read quality drop after adapter trimming with cutadapt?

0

Entering edit mode

7.3 years ago

Nabeel Ahmed ▴ 10

I am using cutadapt (v1.14) to trim adapter from a published Ribosome profiling dataset (short single-end reads of 51 nt). When I run FastQC on the raw data, I see that the read quality is pretty good at the 3' end with the entire box plot of quality > 30. However when I trim the adapter and run FastQC on the processed data, I find that the Quality drops at the 3' end. I am unable to understand why there will be a drop in quality after adapter trimming when the original reads were of high quality. Would appreciate if someone could throw some light on this.

The adapter trimming command is as follows

cutadapt  -a CTGTAGGCACCATCAATATCTCGTATGC -q 20 -m 20 -M 45 -O 6 -o SRR1562913_trimmed.fastq SRR1562913_1.fastq

FastQC on the raw data Raw dataset

FastQC on the processed data Processed data

cutadapt adapter trimming sequencing • 4.7k views

ADD COMMENT • link updated 7.3 years ago by GenoMax 152k • written 7.3 years ago by Nabeel Ahmed ▴ 10

score 2 · Accepted Answer · 2018-03-26

2

Entering edit mode

7.3 years ago

Buffo ★ 2.4k

Because adapters has a good quality consensus sequence, if you remove it, mean quality drops (even the length) for your experimental reads.

ADD COMMENT • link 7.3 years ago by Buffo ★ 2.4k

0

Entering edit mode

But shouldn't the lower quality seen for position 33-43 in the processed data be visible for these positions in the raw data? Of course their numbers would be small so that mean quality is higher, but even the lower bounds of the box plot is > 30 in the raw data

ADD REPLY • link 7.3 years ago by Nabeel Ahmed ▴ 10

0

Entering edit mode

The lower bound of the box plot (lower whisker) is not the minimal observable value. The exact definition varies. It may represent the tenth percentile for example. If the raw plot of adapter contamination shows high values (say above 50%), the processed box plot may possibly be showing the raw outliers.

ADD REPLY • link 7.3 years ago by jomo018 ▴ 730

0

Entering edit mode

Thanks. I think this explains it. The lower bound is the 10th percentile according to FastQC documentation. The bad quality reads must be in the lowest 10th percentile and hence do not show up in the raw data plot

ADD REPLY • link 7.3 years ago by Nabeel Ahmed ▴ 10

0

Entering edit mode

The adapter do not necessarily have to occur at the very end of your reads, so I think some of them might occur in the 33-43 range boosting the quality score in the region (prior to trimming) as well

ADD REPLY • link 7.3 years ago by lieven.sterck 15k

0

Entering edit mode

So this implies that after adapter trimming you always need to do another round of quality trimming then? and that the order is important: first adapter then quality?

ADD REPLY • link 7.3 years ago by lieven.sterck 15k

1

Entering edit mode

Every dataset is different. Even in this case most of data is still above Q20 so as long as there is a reference genome available to align against, no quality trimming should be needed.

ADD REPLY • link 7.3 years ago by GenoMax 152k

0

Entering edit mode

true. especially if you assume that the aligner will do soft clipping/trimming of the data (which most do I think)

I was however thinking in the case of assembly (which is obviously not the case in the question asked here).

ADD REPLY • link 7.3 years ago by lieven.sterck 15k

0

Entering edit mode

For any de novo work it would be appropriate (perhaps required) to quality trim the data at Q20 (or stricter).

ADD REPLY • link 7.3 years ago by GenoMax 152k

0

Entering edit mode

my thought exactly. However I'm a little nervous about the order of trimming which apparently has (severe) impact on the result. and OK, normally you would probably first get rid of the adapters and then do Q-trimming.

I'm a bit rusty on the cutadapt syntax :/ but is the cmdline given in this post not also doing Q-trimming as well ( -q 20 )? If so, I'm concerned that other tools that do both adapter removal and Q-trim combined might also not apply the "correct" order

ADD REPLY • link 7.3 years ago by lieven.sterck 15k