Entering edit mode
4.5 years ago
ido.idobar
▴
10
Hi all,
I'm running a pipeline that includes bbduk.sh
for reads trimming, then aligning the trimmed reads to a reference genome using bowtie2
, piping the alignment through samblaster
to remove duplicates and finally sorting, adding read groups and saving as BAM with Picard
.
Each job processes a paired read on an HPC node.
The problem is that despite specifying the number of requested threads, bbduk is trying to use more threads than requested, causing the job to be terminated by the queue manager.
This is my command:
bbduk.sh -Xmx1g ref=/home/ibar/.pyenv/versions/miniconda-latest/envs/aDNA/opt/bbmap-38.86-0/resources/adapters.fa ktrim=r k=23 mink=11 hdist=1 qtrim=rl trimq=10 tpe tbo int minlen=30 ziplevel=9 ow threads=12 in=./D10_#.fastq.gz out=trimmed_D10_#.fastq.gz stats=D10.stats
And this is the output:
java -ea -Xmx1g -Xms1g -cp /home/ibar/.pyenv/versions/miniconda-latest/envs/aDNA/opt/bbmap-38.86-0/current/ jgi.BBDuk -Xmx1g ref=/home/ibar/.pyenv/versions/miniconda-latest/envs/aDNA/opt/bbmap-38.86-0/resources/adapters.fa ktrim=r k=23 mink=11 hdist=1 qtrim=rl trimq=10 tpe tbo int minlen=30 ziplevel=9 ow threads=12 in=./D10_#.fastq.gz out=trimmed_D10_#.fastq.gz stats=D10.stats ow
Executing jgi.BBDuk [-Xmx1g, ref=/home/ibar/.pyenv/versions/miniconda-latest/envs/aDNA/opt/bbmap-38.86-0/resources/adapters.fa, ktrim=r, k=23, mink=11, hdist=1, qtrim=rl, trimq=10, tpe, tbo, int, minlen=30, ziplevel=9, ow, threads=12, in=./D10_#.fastq.gz, out=trimmed_D10_#.fastq.gz, stats=D10.stats, ow]
Version 38.86
Set INTERLEAVED to true
Set threads to 12
maskMiddle was disabled because useShortKmers=true
Reset INTERLEAVED to false because paired input files were specified.
0.235 seconds.
Initial:
Memory: max=1029m, total=1029m, free=995m, used=34m
Added 217135 kmers; time: 0.457 seconds.
Memory: max=1029m, total=1029m, free=957m, used=72m
Input is being processed as paired
Started output streams: 0.229 seconds.
=>> PBS: job killed: ncpus 14.83 exceeded limit 12 (sum)
Exception in thread "Thread-16" Exception in thread "Thread-18"
Thanks, Ido
This should not be happening. Looks like you are using PBS. Are you asking for a corresponding number of threads on queue manager side? I would ask for 4 more cores (or reduce the number for bbduk.sh) since it looks like there is some overhead to the command.
Looks like it, but as you've said, this shouldn't happen and I never had issues with it before (and I've used this approach many times before). This is a new conda environment, so it might be related to a newer
bbmap
version (this one is Version 38.86). I hope Brian Bushnell might have more insights about this. Thanks for your quick reply, I'll try adding a few extra cores to my request to the queue manager.This is the explanation from the online user guide:
Not very helpful...
This may also be related to how your job manager is set up. I use threads with BBMap under SLURM and have not had this specific issue ever. If you are using
pigz
then turn it off bypigz=f
in your command line and see if that fixes this. It will add some time to individual jobs but that may be a safe compromise.In my case (using PBSPro), the only solution was to use fastq instead fastq.gz (I manually decompress fastq.gz, run the tool, and manually compress the outputs) as input and output and use slightly fewer threads than I request from the scheduler (requested threads - 2). Otherwise the job always failed due to too many used threads.
I am guessing that you are exceeding the limit because you are reading from and/or creating a gzipped output. If I remember correctly - someone here will know this better - all BBTools programs use
pigz
when (un)compressing, and that program tends to be CPU-greedy. I suggest you try removingziplevel=9
from your command and saving the files without.gz
.Also, it is generally a good idea to ask for fewer threads within the program than what you reserve in your job manager, as programs sometimes spill out of their allotted number by a percent or so. In other words, follow @genomax's advice.
Thanks for your reply. I've tried it again and had the same issue with
pigz=f
. I prefer to save the files compressed since our workspace capacity is limited and monitored. Tried it on another HPC cluster (using PBS as well) and didn't encounter that problem using exactly the same command. It may be the older BBMap version that is installed there (v38.79) or some differences in the way the scheduler works. I'll try to downgrade BBMap and see if it solves the problem.While anything is possible I think this problem is not related to BBMap version. It is possible that cluster you are having issues with is setup with stricter limits than other one where you did not. Did you try reducing the number of threads in bbduk command line?
Hi GenoMax Coming back to this as I'm still having the same issues, is it possible to provide
bbduk.sh
with custom Java options (such us-XX:ParallelGCThreads=4
to limit the number of garbage collection threads), or do I need to manually edit the wrapper (or construct the Java command myself)? I've used this approach successfully with other Java tools that had the same issue.