Entering edit mode
8.0 years ago
antoinefelden
▴
60
I'm trying to run TopHat2 on my RNA-seq sample, but while trial job with a small subset of my samples worked fine, I can't run it with my full dataset. The trial run took around 30 GB as input, and everything went smooth, but when I tried with the full dataset of ~250 GB, then I got the following errors:
Traceback (most recent call last):
File "/srv/global/scratch/groups/sbs/TopHat/2.1.1/tophat", line 4107, in <module>
sys.exit(main())
File "/srv/global/scratch/groups/sbs/TopHat/2.1.1/tophat", line 4081, in main
params.gff_annotation)
File "/srv/global/scratch/groups/sbs/TopHat/2.1.1/tophat", line 2757, in compile_reports
print >> run_log, " ".join(bamsort_cmd)
IOError: [Errno 28] No space left on device
Do you have any estimation of the temporary files TopHat2 would produce with that input? I'm trying to figure out what could take so much space, as our local work area has a capacity of 645 GB.
Thanks, Antoine
What operating system are you running on? And how is it set up - are there different partitions? What command are you using to run TopHat - specifically where are you directing the output to? How quickly does it give the error - straight away or after a while?
645GB capacity with a 250GB input FASTQ sounds like its going to be tight - given SAM/BAM files will be created etc - is the 645GB completely free - or is there other data in there? i.e. how much of the 645GB is actually free before Tophat starts
I work on my university HPC, on Linux. I'm running TopHat to align 15*2 read files (~8 GB each), with the --read-realign-edit-dist option which is known to increase computing time and maybe memory requirements as well? All the raw files are copied into the working space before the TopHat run starts, so effectively the free space in the local work area would be 645 - 250 = ~ 400 GB. The global scratch partition (shared by everyone I think) is 2TB, but the error occurred within the TopHat run (the last checkpoints were "Joining segment hits" and "Reporting output tracks"), not when copying the output so I'm guessing the problem is in the local work area.
During its normal run tophat will create some rather large temp files, which will be written during things like "Reporting output tracks". Hopefully those aren't uncompressed, though tophat is old enough that I wouldn't be surprised. Can you use a newer/faster/better aligner instead? hisat2 and STAR are typically the go-to RNAseq aligners these days.
I will use others but I wanted to compare outputs (namely with BBTools, I haven't looked at hisat2 or STAR). Since temporary files seem to be the issue here, is there any way to tell TopHat where to stick them somewhere else than the local work area?
Not to my knowledge.