enter code here
Hi all,
I'm trying to use bbduk.sh to trim out my 16S V4 primer sequences from my samples using command below.
bbduk.sh in1=${SAMPLE}_R1_001.fastq.gz in2=${SAMPLE}_R2_001.fastq.gz \
out1=ra_${SAMPLE}_R1.fastq out2=ra_${SAMPLE}_R2.fastq \
ktrim=l k=20 mink=19 copyundefined=t \
literal="GTGCCAGCMGCCGCGGTAA,GGACTACHVGGGTWTCTAAT" hdist=1 stats=${SAMPLE}_stats.txt \
tpe tbo
But from the result, it trimmed out most of the reads (around 90%). I then tried to use another tool cutadapt to check and it gives a totally different results.
At first I thought there was something wrong with the sequencing run but since cutadapt is able to retain most of the reads, I believe there is something wrong with my command above.
EDIT: Here is my cutadapt command
cutadapt -g GTGCCAGCMGCCGCGGTAA -G GGACTACHVGGGTWTCTAAT \
-o c_${SAMPLE}_R1.fastq -p c_${SAMPLE}_R2.fastq \
${SAMPLE}_R1_001.fastq.gz ${SAMPLE}_R2_001.fastq.gz
Really appreciate if anyone can help me on this. I spent too much time checking this issue now lol. Thank you in advance.
This would be dependent on the structure of your amplicon. If you provide both sequences then the trimming is going to happen to the left (you are doing
ktrim=l
) of where ever those sequences are found. That means if one of sequences provided inliteral=
is at 3'-end of a read then that entire read may be dropped.Alternative would be to do the trimming in two steps confining to the left end of the expected fragment once and then take care of the right end. You should show us what the expected amplicon looks like in your case.
I thought
literal
will read the first string for first read and second read for second string, no? Then I guess that is why most of the reads were trimmed out.Should I change it to
ktrim=r
?literal=
is simply a way to specify sequences to scan on the command line. There is no order (or read specificity) implied. You could put have them in fasta format in a file and then used that withref=file.fa
with same result. Like I said you may need to do a two-pass run based on where those oligos are present in your read. If you post a cartoon of expected read structure I can try creating bbduk command line, otherwise as @h.mon said below you can stick withcutadapt
for this application (as long as you used it per recommendations linked below).Here is what I received from my colleague.
Really appreciate on your help on this GenoMax
Something like following may work