The ftm parameter from BBduk, trims to a certain modulo (usually 5) . I understand that is the desired behaviour to trim reads that are 151bp long (in stead of expected 150). However I now noticed that the files I get from the seq provider are not raw raw anymore and I guess that already did some adapter removal (still waiting on confirmation of this) and are thus not all 150 (151) bp anymore. In this case if I run BBduk with ftm=5 I will also trim back reads that are 149bp to 145, right? (I do see a small peak appearing at 145bp in my fastQC length graph ).
I'm wondering what the advice would be in this specific case, run BBduk with ftm=5 and potentially loose OK bases or not use the ftm=5 at all? I assume there is no way to tell BBduk to only ftm when the read is longer than 150bp?
If the length of your reads ranges from 35-151bp, most likely adapters were trimmed during bcl to fastq conversion.
The recommendation for these extra 1bp is to trim just the extra 1bp, no need to trim 5bp. I don't think this extra bp will have a big impact on downstream analyses anyway, but I never tested nor seen this tested. I've seen a post showing the extra base to be mostly erroneous, when mapping to a reference. I think it was a SeqAnswers post by Brian Bushnell, but i can't find it again.
What are the downstream analyses you want to perform?
Last base is not erroneous but the Q score may be off since there is no phasing information available. I second the advice that you should not worry about the last base or modulo.
in the 151st position of the fastQC plots I observe a high bias in base composition, as well as quality.
So yes, just trimming off one base was my next option but then again I might be trimming off OK bases as well no? I think it all comes down to (not) having an option to specifically target the 151bp reads ?
The downstream is genome assembly.
I read here and there that the adapter removal done by bcl2ftasq is not super accurate, is that correct? (aka, should I invest in more rigorous adapter removal)? I don't see any indication of adapter presence though in the fastQC result
It would not hurt to scan/trim the data with
bbduk
again with thetbo tpe
options to get any remaining adapter bases. You can also separate the reads that are longer than 150 bp, do aforcetrimright=0
and then put the resulting reads back in the original pool.excellent suggestion genomax , thx
If I add/use the
ftr=149
parameter I would be fine, no? then I'm trimming everything back to 150bp length (and will thus not touch any of the shorter reads)?Would that not remove 149 bases (force trim bases on right) on right leaving one/two leftmost?I always test with a small set of sequences.I agree there is a possibility for confusion, but I understand otherwise from the BBduk manual:
and there I'm confused myself, it should then be
ftr=150
in my case?0 position is the beginning of the read so both options count from the beginning of the read. For a programmer this sort of stuff is second nature but I need to think twice. You are correct.
after running some test it turns out
ftr=149
is the way to go.getting somewhat off-topic here but I start to question/wonder about the applicability of the
ftm
parameter of bbduk?One thing to knock BBTools is an over abundance of parameters. Some may have been put in to address very specific use cases that are not widely applicable. I have never used
ftm
.It works!Thanks!