cutadapt not trimming all adapters in PE Illumina sequencing
3
0
Entering edit mode
8 months ago
noodle ▴ 590

Hi all,

I'm passing the below cutadapt command to PE illumina reads;

cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -m 20 

But this returns PE reads with approx ~5% of reads still containing an adapter. For example in the below, the bolded portion is an adapter in the output file (R1) that doesn't get trimmed. Is there a flag or subsequent processing step to make sure this adapter gets removed? (many but not all of the reads that still contain the adapter have the poly-G, is this some bug/feature of cutadapt?)

@read_1

CGGAAGAGCACACGTCTGAACTCCAGTCACCATAGCGAATCGCGGGTGGCGGGGGGGGGGT

@read_2

GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

NGS Illumina cutadapt • 1.4k views
ADD COMMENT
1
Entering edit mode

Not answering your question but if you are willing to try an alternate program then give bbduk.sh (guide: https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/ ) or fastp a try. BBduk can simultaneously scan any number of arbitrary sequences (including poly-G etc) that you provide in a file.

ADD REPLY
2
Entering edit mode
8 months ago
noodle ▴ 590

So in the end the answer was obvious - this was a tRNA library, so there are many small fragments that make their way through to sequencing. This means adapter-dimers and the sort were sequenced to some extent. The simple fix is to just allow trimming of the adapters 'anywhere' in the command, as in an example below.

cutadapt -a 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC;anywhere;o=24' -A 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT;anywhere;o=24' -m 20 
ADD COMMENT
1
Entering edit mode

Thanks for the follow up! And nice to know about the 'anywhere' option in cutadapt -- I personally wasn't aware of it (though I never worked with such libraries) :)

ADD REPLY
1
Entering edit mode
8 months ago
dsull ★ 6.9k

It's because that does not look like a 3' adapter -- it looks like a 5' adapter which you'd specify with -g

Edit: If it really is a 3' adapter, well, then I simply find it strange that your early sequencing cycles would originate from the middle of a 3' adapter. cutadapt wasn't really optimized for such situations... Maybe you'd need to specify 3' trimming followed by 5' trimming with removal of any sequence that contains the adapter in the latter trimming step.

ADD COMMENT
0
Entering edit mode

It was not obvious to me how this situation developed, but like I eluded to it was for a tRNA library so the final size exclusion steps were permissive to small RNAs, so I think there was 'garbage' of fragmented primers or incomplete PCR products, adapter dimers, and the like. The developers of cutadapt walked me through the fix :)

ADD REPLY
1
Entering edit mode
8 months ago
BioinfGuru ★ 2.1k

Can you post the full command you used? The -p option is missing.

This is paired end data so I think you actually need -a, -A and -p

1) From $ cutadapt --help

  • Parameters -a, -g, -b specify adapters to be removed ... from R1 if data is paired-end

  • The -A/-G/-B/-U/-Q options work like their lowercase counterparts, but are applied to R2 (second read in pair)

2) From Cutadapt User Guide

  • $ cutadapt -a ADAPTER_FWD -A ADAPTER_REV -o out.1.fastq -p out.2.fastq reads.1.fastq reads.2.fastq
  • -p option (this is the short form of --paired-output)
  • ADAPTER_FWD will therefore be trimmed from the forward reads and ADAPTER_REV from the reverse reads.

3) Personally, I'd save the adapters in fasta format and pass the fasta file to both -a and -A. Then as you build up the fasta file with adapters you come across in the future, it is always the same fasta file and code snippet.

  • $ cutadapt -a file:adapter.fa -A file:adapter.fa -o out.1.fastq -p out.2.fastq reads.1.fastq reads.2.fastq
ADD COMMENT
0
Entering edit mode

Below is the original command, but there wasn't anything wrong with that - adding 'anywhere' fixes the issue as described above.

cutadapt -a AGATCGGAAGAGCACACGTCTGAAC -A AGATCGGAAGAGCGTCGTGTAGGGA -m 20 -j 24 --report 'full' -o sample_R1.cutadapt.fq.gz -p sample_R2.cutadapt.fq.gz sample_R1.fastq.gz sample_R2.fastq.gz > sample.cutadapt.log
ADD REPLY

Login before adding your answer.

Traffic: 1621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6