Why does read quality drop after adapter trimming with cutadapt?
1
0
Entering edit mode
6.7 years ago
Nabeel Ahmed ▴ 10

I am using cutadapt (v1.14) to trim adapter from a published Ribosome profiling dataset (short single-end reads of 51 nt). When I run FastQC on the raw data, I see that the read quality is pretty good at the 3' end with the entire box plot of quality > 30. However when I trim the adapter and run FastQC on the processed data, I find that the Quality drops at the 3' end. I am unable to understand why there will be a drop in quality after adapter trimming when the original reads were of high quality. Would appreciate if someone could throw some light on this.

The adapter trimming command is as follows

cutadapt  -a CTGTAGGCACCATCAATATCTCGTATGC -q 20 -m 20 -M 45 -O 6 -o SRR1562913_trimmed.fastq SRR1562913_1.fastq

FastQC on the raw data Raw dataset raw data

FastQC on the processed data Processed data after adapter trimming

cutadapt adapter trimming sequencing • 4.2k views
ADD COMMENT
2
Entering edit mode
6.7 years ago
Buffo ★ 2.4k

Because adapters has a good quality consensus sequence, if you remove it, mean quality drops (even the length) for your experimental reads.

ADD COMMENT
0
Entering edit mode

But shouldn't the lower quality seen for position 33-43 in the processed data be visible for these positions in the raw data? Of course their numbers would be small so that mean quality is higher, but even the lower bounds of the box plot is > 30 in the raw data

ADD REPLY
0
Entering edit mode

The lower bound of the box plot (lower whisker) is not the minimal observable value. The exact definition varies. It may represent the tenth percentile for example. If the raw plot of adapter contamination shows high values (say above 50%), the processed box plot may possibly be showing the raw outliers.

ADD REPLY
0
Entering edit mode

Thanks. I think this explains it. The lower bound is the 10th percentile according to FastQC documentation. The bad quality reads must be in the lowest 10th percentile and hence do not show up in the raw data plot

ADD REPLY
0
Entering edit mode

The adapter do not necessarily have to occur at the very end of your reads, so I think some of them might occur in the 33-43 range boosting the quality score in the region (prior to trimming) as well

ADD REPLY
0
Entering edit mode

So this implies that after adapter trimming you always need to do another round of quality trimming then? and that the order is important: first adapter then quality?

ADD REPLY
1
Entering edit mode

Every dataset is different. Even in this case most of data is still above Q20 so as long as there is a reference genome available to align against, no quality trimming should be needed.

ADD REPLY
0
Entering edit mode

true. especially if you assume that the aligner will do soft clipping/trimming of the data (which most do I think)

I was however thinking in the case of assembly (which is obviously not the case in the question asked here).

ADD REPLY
0
Entering edit mode

For any de novo work it would be appropriate (perhaps required) to quality trim the data at Q20 (or stricter).

ADD REPLY
0
Entering edit mode

my thought exactly. However I'm a little nervous about the order of trimming which apparently has (severe) impact on the result. and OK, normally you would probably first get rid of the adapters and then do Q-trimming.

I'm a bit rusty on the cutadapt syntax :/ but is the cmdline given in this post not also doing Q-trimming as well ( -q 20 )? If so, I'm concerned that other tools that do both adapter removal and Q-trim combined might also not apply the "correct" order

ADD REPLY

Login before adding your answer.

Traffic: 1536 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6