Cutadapt two-color chemistry correction
0
0
Entering edit mode
3.1 years ago
geneticatt ▴ 140

I ran Cutadapt adaptor and quality score trimming on two Illumina NextSeq libraries with and without the two-color sequencing correction to get a sense of how this correction effects my data. Interestingly the two-color chemistry correction causes roughly 2% of the reads to lose adaptor contamination and ~1% of the total number of bases in the file to be trimmed. These changes were similar in the other library I tested (data not shown). I believe this is happening because much of the adaptor contamination is 3' and the high-quality G calls caused by two-color chemistry were preventing some of these from being trimmed based on q-score threshold. It's not intuitive that this correction would cause the number of adaptors in the reads to decrease.

Is this difference a matter of the order in which the various functions are called, or is the possible factoring of q-score in the adaptor search effected by the differential treatment of G when the 2-color chemistry flag is applied?

This would be useful to know because if other parameters influence the outcome of the adaptor search, one should be careful when making a claim about total adaptor contamination of a read set based on cutadapt results.

With two-color correction (--nextseq-trim=20)

Total read pairs processed:         30,525,253
  Read 1 with adapter:               1,080,504 (3.5%)
  Read 2 with adapter:               1,048,854 (3.4%)
Pairs written (passing filters):    29,755,335 (97.5%)

Total basepairs processed: 2,442,020,240 bp
  Read 1: 1,221,010,120 bp
  Read 2: 1,221,010,120 bp
Quality-trimmed:              34,164,589 bp (1.4%)
  Read 1:    12,384,083 bp
  Read 2:    21,780,506 bp
Total written (filtered):  2,351,896,351 bp (96.3%)
  Read 1: 1,175,829,918 bp
  Read 2: 1,176,066,433 bp

Without two-color correction (-q 20)

Total read pairs processed:         30,525,253
  Read 1 with adapter:               1,769,292 (5.8%)
  Read 2 with adapter:               1,731,713 (5.7%)
Pairs written (passing filters):    29,934,277 (98.1%)

Total basepairs processed: 2,442,020,240 bp
  Read 1: 1,221,010,120 bp
  Read 2: 1,221,010,120 bp
Quality-trimmed:              14,600,999 bp (0.6%)
  Read 1:     2,865,608 bp
  Read 2:    11,735,391 bp
Total written (filtered):  2,379,448,479 bp (97.4%)
  Read 1: 1,189,683,871 bp
  Read 2: 1,189,764,608 bp

This is also applicable to NovaSeq data which uses 2-color chemistry and to trimgalore, which wraps cutadapt.

trimgalore QC adaptor cutadapt illumina • 797 views
ADD COMMENT
0
Entering edit mode

the commands don't show the adapter trimming parameter thus it is harder to understand what is going on.

how does the data change when you only run quality trimming?

a way to detect/clarify the competing processes is to run the adapter detection without quality trimming, and the quality trimming alone with no adapter trimming

ADD REPLY

Login before adding your answer.

Traffic: 2370 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6