Hi all,
I run a batch job on a high-performance computing system to sort aligned reads and used GUN parallel to speed up my work, but my job failed with the following reason:
parallel: Error: Output is incomplete. Cannot append to buffer file in $TMPDIR. Is the disk full?
parallel: Error: Change $TMPDIR with --tmpdir or use --compress.
My job script looks like this:
#!/bin/bash
module load samtools/1.2
export TMPDIR=/scratch/$SLURM_JOBID
cd /data
ls *sam* | parallel "samtools sort -T /scratch/$SLURM_JOBID/{.} -O bam -o {}.bam {}"
Does anyone know how to solve this problem? Thank you in advance.
Best regards
Have you checked that you're not using up all of the available scratch space?
Yes, I checked. I allocated 500 GB scratch space.
The main problem is: the parallel process
Cannot append to buffer file in $TMPDIR
as described in my post. I need to change the default tmpdir to a new one but I am not clear how to set it.I meant did you actually check that it wasn't full while you were trying to run these jobs? Typically scratch spaces on HPC cluster nodes aren't dedicated to a user and anyone that's using that node can write to the scratch. So you might not have all of the space you need. I don't know that SLURM will check that there's 500gb of scratch available, I think it only checks that you haven't exceeded that amount.
$TMPDIR
is being set with the line of codeexport TMPDIR=/...
. You can changeTMPDIR
by changing/scratch/$SLURM_JOBID
to whatever path you want to use. If they're temporary files however, the proper usage would be to use scratch as you are.GNU parallel can run on multiple nodes, have you considered trying this?
Thank you for your comments and advice.
I haven't checked how much the space my job used, and I set only one node but I can try multiple nodes.
I have already tried
export TMPDIR=/scratch/$SLURM_JOBID
, but it doesn't work.Based on the error information the parallel processing gave, I guess I haven't set TMPDIR correctly.
It is always a good idea to check the amount of resources your job used, that way you can make sure you're not hogging resources by over-allocating and that any crashes/etc are not due to running out of resources. You should also check to make sure that files (either temporary or final results) are being written to the correct places. You should never keep trying jobs and adding resources till it works, you're going to waste a ton of time.
The error is probably because parallel can't write to
TMPDIR
, the question is if it can't write there because it is full or because it cant find/accessTMPDIR
or thatTMPDIR
doesn't exist.The reason to check the usage of your scratch space is that parallel will cache result files in a temporary directory. By default it is
/var/tmp/
, but if$TMPDIR
is set, it will cache the result files there. So, it seems like both samtools and parallel are writing to/scratch/.../
, which might mean you're using more space than you think.