TL;DR: I have high universal Illumina adapter content in my paired-end RNA-seq reads and trimming with both the original sequence and reverse complement of the universal adapter did not completely remove the adapter content and was only effective for the R2 reads.
I am trying to trim adapter sequences from my paired-end RNA-seq data using cutadapt and I am not having a lot of success. When I ran my raw .fastq files through FastQC, it revealed that my sequences had upwards of 35% adapter content in the latter portions of the 150 bp reads.
Since the data showed significant universal adapter presence, I decided to use cutadapt to trim these sequences with the universal adapter sequence that Illumina provides. I used the following lines of code in bash:
for i in /blue/nicholson/jleehan/20200921_SMMasn_RNAseq/rawdata/*R1_001.fastq
do
SAMPLE=$(echo ${i} | sed "s/R1_001\.fastq//")
# echo ${SAMPLE}R1.fastq ${SAMPLE}R2.fastq
cutadapt -a AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT -A AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-q 30 -m 60 -o ${SAMPLE}R1_trm.fastq -p ${SAMPLE}R2_trm.fastq ${SAMPLE}R1_001.fastq ${SAMPLE}R2_001.fastq
echo ${SAMPLE} trimmed
done
Looking at the summary of this data, it appeared to do something but not much.
Looking at the adapter content graph from FastQC I could not even see a visible difference:
Seeing how little this helped, I thought, maybe if I try to trim the reverse complement instead, that will help. So I did the same thing, but modifying the code to trim the reverse complement of the universal adapter sequence. I'm going to include that code as well:
for i in /blue/nicholson/jleehan/20200921_SMMasn_RNAseq/rawdata/*R1_001.fastq
do
SAMPLE=$(echo ${i} | sed "s/R1_001\.fastq//")
# echo ${SAMPLE}R1.fastq ${SAMPLE}R2.fastq
cutadapt -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -q 30 -m 60 -o ${SAMPLE}R1_trm2.fastq -p ${SAMPLE}R2_trm2.fastq ${SAMPLE}R1_001.fastq ${SAMPLE}R2_001.fastq
echo ${SAMPLE} trimmed
done
This time, it appeared to do something noticeable, but the effect was significantly more pronounced in the R2 reads than the R1 reads.
It was nice to see some progress being made, but to be honest, I have no idea how I could proceed from here. I still have >30% adapter content in my R1 reads and >10% in my R2 reads. Definitely not what I would consider to be sufficiently trimmed from my past experience. I believe I saw elsewhere in the forum that it may be more effective to trim smaller portions of the adapter sequence but before I went and wasted the 14 hours that it takes for me to run this, I would ask the forum since y'all actually know what you're doing.
I would suggest you try bbduk or trimmomatic - in addition to removing adapters by matching their sequence, both can remove adapter by examining the overlap between R1 and R2, thus they are more sensitive for paired-end data.
Thanks for the advice! I'll give those tools a try.