BBSplit Slows Down Variably
1
0
Entering edit mode
22 months ago

Hi folks,

I'm using bbmap to filter out contaminate reads by mapping to human, mouse, and phiX genomes.This can be really slow or fast, for roughly the same number of reads.

Here's the command I used:

bbsplit.sh in1=reads.trim.1.fq in2=reads.trim.2.fq \ ref=ref-genomes/phiX174.fa,ref-genomes/GRCm39.fa,ref-genomes/GRCh38.fa \ outu1=dedupe_reads/dedupe_reads.1.fq.gz outu2=dedupe_reads/dedupe_reads.2.fq.gz threads=40 \ overwrite=true

Here's the speed stats for 28 million reads:

Mapping Mode:           normal
Reads Used:             28695864    (2590370926 bases)

Mapping:            3695.342 seconds.
Reads/sec:          7765.42

Here's the speed stats for 36 million reads:

Reads Used:             35647394    (4046619973 bases)

Mapping:            186.885 seconds.
Reads/sec:          190745.37
kBases/sec:         21653.03

Same settings were used for both. Any idea as to what I can do make everything faster?

brian bushnell bbsplit bbmap • 716 views
ADD COMMENT
0
Entering edit mode

GenoMax @Brian Bushnell

ADD REPLY
0
Entering edit mode
22 months ago
GenoMax 145k

bbsplit.sh can require significant memory depending on the number of genomes being used and size of the data set. My recommendation would be to explicitly allocate memory (at least 70G since you are using two large model genomes) by using option -Xmx70g. I would cut back on the number of threads as well to say 16. You could stay with 40 if you can assign 150G+ of RAM to ensure that the threads have plenty of RAM available.

dedupe_reads/dedupe_reads.1.fq.gz

Since you had this in your command line I also wanted to say that bbsplit.sh does not deduplicate reads. It bins reads based on the criteria you choose to multiple genomes. If you are looking to dedupe the reads then there are other tools in BBTools package (dedupe.sh or clumpify.sh) that you should be looking at.

ADD COMMENT

Login before adding your answer.

Traffic: 2731 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6