Question

bbduk trimmed out too many sequences

0

Entering edit mode

4.0 years ago

AfinaM ▴ 30

enter code hereHi all,

I'm trying to use bbduk.sh to trim out my 16S V4 primer sequences from my samples using command below.

bbduk.sh in1=${SAMPLE}_R1_001.fastq.gz in2=${SAMPLE}_R2_001.fastq.gz \
        out1=ra_${SAMPLE}_R1.fastq out2=ra_${SAMPLE}_R2.fastq \
        ktrim=l k=20 mink=19 copyundefined=t \
        literal="GTGCCAGCMGCCGCGGTAA,GGACTACHVGGGTWTCTAAT" hdist=1 stats=${SAMPLE}_stats.txt \
        tpe tbo

But from the result, it trimmed out most of the reads (around 90%). I then tried to use another tool cutadapt to check and it gives a totally different results.

At first I thought there was something wrong with the sequencing run but since cutadapt is able to retain most of the reads, I believe there is something wrong with my command above.

EDIT: Here is my cutadapt command

cutadapt -g GTGCCAGCMGCCGCGGTAA -G GGACTACHVGGGTWTCTAAT \
    -o c_${SAMPLE}_R1.fastq -p c_${SAMPLE}_R2.fastq \
    ${SAMPLE}_R1_001.fastq.gz ${SAMPLE}_R2_001.fastq.gz

Really appreciate if anyone can help me on this. I spent too much time checking this issue now lol. Thank you in advance.

bbduk bbmap amplicon • 3.4k views

ADD COMMENT • link updated 4.0 years ago by GenoMax 151k • written 4.0 years ago by AfinaM ▴ 30

1

Entering edit mode

This would be dependent on the structure of your amplicon. If you provide both sequences then the trimming is going to happen to the left (you are doing ktrim=l) of where ever those sequences are found. That means if one of sequences provided in literal= is at 3'-end of a read then that entire read may be dropped.

Alternative would be to do the trimming in two steps confining to the left end of the expected fragment once and then take care of the right end. You should show us what the expected amplicon looks like in your case.

ADD REPLY • link 4.0 years ago by GenoMax 151k

0

Entering edit mode

I thought literal will read the first string for first read and second read for second string, no? Then I guess that is why most of the reads were trimmed out.

Should I change it to ktrim=r?

ADD REPLY • link 4.0 years ago by AfinaM ▴ 30

0

Entering edit mode

literal= is simply a way to specify sequences to scan on the command line. There is no order (or read specificity) implied. You could put have them in fasta format in a file and then used that with ref=file.fa with same result. Like I said you may need to do a two-pass run based on where those oligos are present in your read. If you post a cartoon of expected read structure I can try creating bbduk command line, otherwise as @h.mon said below you can stick with cutadapt for this application (as long as you used it per recommendations linked below).

ADD REPLY • link 4.0 years ago by GenoMax 151k

0

Entering edit mode

Here is what I received from my colleague.

Really appreciate on your help on this GenoMax

ADD REPLY • link 4.0 years ago by AfinaM ▴ 30

0

Entering edit mode

Something like following may work

bbduk.sh -Xmx2g in1=R1.fq.gz in2=R2.fq.gz out=stdout.fq literal=515F_sequence restrictleft=length_of_515F+5bp ktrim=l k=15 rcomp=t | \
bbduk.sh -Xmx2g in=stdin.fq out1=final_R1.fq.gz out2=final_R2.fq.gz literal=806R_sequence restrictright=length_of_806R+5bp ktrim=r rcomp=t k=15

ADD REPLY • link 4.0 years ago by GenoMax 151k

score 0 · Answer 1 · 2021-05-19

You didn't show your Cutadapt command, I hope you followed the documentation:

Trimming (amplicon-) primers from both ends of paired-end reads

I think for removing primers, Cutadapt does a better job than BBDuk, as its behaviour is more intuitive: it will remove primers at both ends in one pass, and may retain only reads where the primers were found. As GenoMax pointed out, your BBDuk command removes the whole read if the primers is found at the right, 3' side of the read.

As I don't use BBDuk for primer removal, I don't know the best solution, but you may need to remove the left and right primers separately - you can use pipes to chain the BBDuk commands and avoid intermediary files. Another solution would be to fiddle with the rcomp, restrictleft and restrictright parameters to restrict the search:

rcomp=t             Look for reverse-complements of kmers in addition to 
                    forward kmers.
restrictleft=0      If positive, only look for kmer matches in the 
                    leftmost X bases.
restrictright=0     If positive, only look for kmer matches in the 
                    rightmost X bases.