Question

bbduk paired-end RNA-seq trimming command

2

Entering edit mode

2.8 years ago

predeus ★ 2.1k

Hi all,

I was wondering what is the consensus about the optimal trimming for paired-end RNA-seq reads using bbduk.sh? It's been a great tool for other applications that outperformed its competitors - e.g. bacterial genome assembly from Nextera short reads clearly worked better. However, now I mostly work with various RNA-seq experiments and was wondering if someone has an opinion about the best approach.

So far, I've been using this command, that I've compiled from the readme at some point - by using the settings for genomic PE reads, and adding the "trimpolya" option:

bbduk.sh in1=${TAG}_1.fastq.gz in2=${TAG}_2.fastq.gz out1=${TAG}_bbduk_1.fastq.gz out2=${TAG}_bbduk_2.fastq.gz ref=$ADAPTERS trimpolya=10 ktrim=r k=23 mink=11 hdist=1 tpe tbo &> $TAG.bbduk.log

Does this look reasonable? Is there anything else to consider here? Maybe Brian could comment?

Thank you in advance, as always.

rna-seq paired-end bbduk trimming adapter • 3.5k views

ADD COMMENT • link updated 2.8 years ago by GenoMax 151k • written 2.8 years ago by predeus ★ 2.1k

score 2 · Answer 1 · 2022-08-23

2

Entering edit mode

2.8 years ago

GenoMax 151k

Not @Brian but yes that looks reasonable.

You should explicitly add -Xmx4g In case your server does not auto detect available RAM, bbduk needs little RAM) and threads=N (replace with a number of cores you want to use), if you want to speed things up significantly. That mink value can be (length of sequence you want to find/2). If you want to be strict, hdist would be set to 0.

BTW: bbmap is available via conda.

ADD COMMENT • link 2.8 years ago by GenoMax 151k

0

Entering edit mode

Thank you, this gives me confidence. The threads thing is very interesting - I thought Java just used all the cores you gave it, at least that was my rookie impression.

That mink value can be a bit smaller than (length of your reads/2)

Does that mean that if my reads are 2x150 bp, I want to set it to a much higher value, like 70?

ADD REPLY • link 2.8 years ago by predeus ★ 2.1k

1

Entering edit mode

Yikes. I meant to say smaller than 1/2 the length of adapter (or any other sequence you are looking to find). Will edit my answer above.

I thought Java just used all the cores you gave it,

That is possible but to make sure java does not misbehave I find it safer to explicitly add memory and core allocations. One needs to be careful when running via a job scheduler on a cluster.

ADD REPLY • link 2.8 years ago by GenoMax 151k