bbduk paired-end RNA-seq trimming command
1
1
Entering edit mode
2.3 years ago
predeus ★ 2.1k

Hi all,

I was wondering what is the consensus about the optimal trimming for paired-end RNA-seq reads using bbduk.sh? It's been a great tool for other applications that outperformed its competitors - e.g. bacterial genome assembly from Nextera short reads clearly worked better. However, now I mostly work with various RNA-seq experiments and was wondering if someone has an opinion about the best approach.

So far, I've been using this command, that I've compiled from the readme at some point - by using the settings for genomic PE reads, and adding the "trimpolya" option:

bbduk.sh in1=${TAG}_1.fastq.gz in2=${TAG}_2.fastq.gz out1=${TAG}_bbduk_1.fastq.gz out2=${TAG}_bbduk_2.fastq.gz ref=$ADAPTERS trimpolya=10 ktrim=r k=23 mink=11 hdist=1 tpe tbo &> $TAG.bbduk.log

Does this look reasonable? Is there anything else to consider here? Maybe Brian could comment?

Thank you in advance, as always.

rna-seq paired-end bbduk trimming adapter • 2.5k views
ADD COMMENT
2
Entering edit mode
2.3 years ago
GenoMax 148k

Not @Brian but yes that looks reasonable.

You should explicitly add -Xmx4g In case your server does not auto detect available RAM, bbduk needs little RAM) and threads=N (replace with a number of cores you want to use), if you want to speed things up significantly. That mink value can be (length of sequence you want to find/2). If you want to be strict, hdist would be set to 0.

BTW: bbmap is available via conda.

ADD COMMENT
0
Entering edit mode

Thank you, this gives me confidence. The threads thing is very interesting - I thought Java just used all the cores you gave it, at least that was my rookie impression.

That mink value can be a bit smaller than (length of your reads/2)

Does that mean that if my reads are 2x150 bp, I want to set it to a much higher value, like 70?

ADD REPLY
1
Entering edit mode

Yikes. I meant to say smaller than 1/2 the length of adapter (or any other sequence you are looking to find). Will edit my answer above.

I thought Java just used all the cores you gave it,

That is possible but to make sure java does not misbehave I find it safer to explicitly add memory and core allocations. One needs to be careful when running via a job scheduler on a cluster.

ADD REPLY

Login before adding your answer.

Traffic: 1992 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6