Hi
I would like to use bbduk
to replicate the command line flags used by trim_galore
for EM-Seq (NEB)
.
The bismark user guide recommends this for trim_galore
--clip_R1 10 --clip_R2 10 --three_prime_clip_R1 10 --three_prime_clip_R2 10
Explanation of the trim_galore
parameters
--clip_R1 <int>
Instructs Trim Galore to remove <int> bp from the 5' end
of read 1 (or single-end reads). This may be useful if
the qualities were very poor, or if there is some
sort of unwanted bias at the 5' end.
Default: OFF.
--clip_R2 <int>
Instructs Trim Galore to remove <int> bp from the 5' end
of read 2 (paired-end reads only). This may be useful if
the qualities were very poor, or if there is some sort
of unwanted bias at the 5' end. For paired-end BS-Seq,
it is recommended to remove the first few bp because
the end-repair reaction may introduce a bias towards low
methylation. Please refer to the M-bias plot section in
the Bismark User Guide for some examples.
Default: OFF.
--three_prime_clip_R1 <int>
Instructs Trim Galore to remove <int> bp from
the 3' end of read 1 (or single-end reads) AFTER
adapter/quality trimming has been performed. This
may remove some unwanted bias from the 3' end
that is not directly related to adapter sequence or
basecall quality.
Default: OFF.
--three_prime_clip_R2 <int>
Instructs Trim Galore to remove <int> bp from
the 3' end of read 2 AFTER adapter/quality trimming
has been performed. This may remove some
unwanted bias from the 3' end that is not
directly related to adapter sequence or basecall quality.
Default: OFF.
I was wondering how I could get the same with bbduk
- the relevant bbduk
flags I could find were
forcetrimleft=0 (ftl) If positive, trim bases to the left of this position
forcetrimright=0 (ftr) If positive, trim bases to the right of this position
forcetrimright2=0 (ftr2) If positive, trim this many bases on the right end.
forcetrimmod=0 (ftm) If positive, right-trim length to be equal to zero,
Based on what trim_galore
provides, I am guessing the equivalent bbduk
command line option would be
bbduk.sh in1=read1.fq in2=read2.fq out1=clean1.fq out2=clean2.fq ktrim=l ktrim=r forcetrimleft=10
forcetrimright2=10
Is the above correct?
Thanks in advance.
Hi GenoMax
Thanks a lot. I have some follow up questions.
Your answer has 2 code snippets:
without pipes and without use k=NN, with ref=adapters.fa
Are you suggesting either of the two code snippets will work in my case?
You also mention this below between the 2 snippets and I am a bit confused here as both the snippets have ref=adapters.fa. Were you referring to add it to the second part of the pipe in Code snippet 1?
Also, I think code snippet 1 needs
int=t
- this based on Brian's note in #43 hereThanks once again.
Code snippet 1 will need
int=t
thanks for linking the thread (edited my example).I don't think Brian has explicitly put down the order of operations in
bbduk
in help docs but I think the two command lines above should produce similar output once you addadapters.fa
files to your example (since without thatbbduk
has no way to know about which adapter sequences to scan/trim). You should also addk=NN
in your command line and only usektrim=r
since with Illumina the adapter should only be present on 3'-end of the sequence.Hi GenoMax
Is this the order of operations you were referring - from the same link I shared before (Brian's note in #43 here)
Also, in your original answer (copy/pasted below), should it actually be
1/2 the length of adapters
- asking so as I happened to see this comment of yoursI think both ways still may produce very similar results but you can try and let us know. Going by the list above my way may be operationally logical since
force trim
takes precedence.You want to select a
k
to make sure that you get good initial matches. What is the length of your reads?I am trying this with some example data provided by NEBiolabs in their paper - reads are 99 bp
Try
k=25
. I generally stay somewhere in 20-25 range.Hi
When I try code snippet 1 (piped command), I am getting this weird error about
can't read file stdout.fastq
I think the 2nd part of the pipe should be
in=stdin.fq
instead of in=stdout.fq and when I change that it works.Correct. I will change that in my answer.