I have single-end sequencing data prepared using the Illumina Nextera library prep kit. In my FastQC plots I can see adapter contamination at the 3' end of my reads and some N base calls at the 5' end. I run Trimmomatic in single-end mode using the Nextera adapters file provided, plus the Nextera transposase sequence FastQC uses (including its reverse complement):
TrimmomaticSE: Started with arguments:
-phred33 -threads 1 data/raw_reads/LT119/160418_D00248_0165_AC931NANXX_8_NX-P7-008_NX-P5-017.fastq.gz data/trim_reads/LT119/160418_D00248_0165_AC931NANXX_8_NX-P7-008_NX-P5-017_trim.fastq.gz ILLUMINACLIP:/home/jashmore/anaconda3/share/trimmomatic-0.36-3/adapters/Nextera.fasta:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using Long Clipping Sequence: 'GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG'
Using Long Clipping Sequence: 'TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG'
Using Short Clipping Sequence: 'CTGTCTCTTATA'
Using Medium Clipping Sequence: 'AGATGTGTATAAGAGACAG'
Using Short Clipping Sequence: 'TATAAGAGACAG'
Using Short Clipping Sequence: 'TCCTCGGCCG'
Using Medium Clipping Sequence: 'GGTCGCGGCCGAGGATC'
Using Medium Clipping Sequence: 'CTGTCTCTTATACACATCT'
Using Short Clipping Sequence: 'CGGCCGAGGA'
Using Medium Clipping Sequence: 'GATCCTCGGCCGCGACC'
Using Long Clipping Sequence: 'TCCTCGGCCGCGACCACGCTGCCCTATAGTGAGTCGTATTAG'
Using Long Clipping Sequence: 'CTAATACGACTCACTATAGGGCAGCGTGGTCGCGGCCGAGGA'
Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTCCGAGCCCACGAGAC'
Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTGACGCTGCCGACGA'
ILLUMINACLIP: Using 0 prefix pairs, 14 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 3093105 Surviving: 3073868 (99.38%) Dropped: 19237 (0.62%)
TrimmomaticSE: Completed successfully
After trimming I can see that the adapter contamination decreases (but isn't completely removed), and that the N base calls are still present at the 5' end. Could anyone explain why this is, or what I'm doing wrong? Granted, the amount of contamination is ~1% and shouldn't be too detrimental to my mapping, I'd still like to work out why.
I've worked out that the N bases near the beginning of the reads won't be removed because they appear at base 2 - Trimmomatic starts at base 1 and checks if it is below the threshold (which it isn't) so it does not trim and does not move on to base 2... still not sure why the Nextera adapters are not being removed.
Hi James,
I am having the same problem with Nextera adapters. After running trimmomatic, I still see contamination at the 3' end. Did you figure out how to deal with it? Thanks.
I ended up using cutadapt instead. The following command seems to work fine for me:
I tried trim galore and it works for me. Thanks James!!