Question

TopHat2: "IOError: [Errno 28] No space left on device" with large input

0

Entering edit mode

8.5 years ago

antoinefelden ▴ 60

I'm trying to run TopHat2 on my RNA-seq sample, but while trial job with a small subset of my samples worked fine, I can't run it with my full dataset. The trial run took around 30 GB as input, and everything went smooth, but when I tried with the full dataset of ~250 GB, then I got the following errors:

Traceback (most recent call last):
  File "/srv/global/scratch/groups/sbs/TopHat/2.1.1/tophat", line 4107, in <module>
    sys.exit(main())
  File "/srv/global/scratch/groups/sbs/TopHat/2.1.1/tophat", line 4081, in main
    params.gff_annotation)
  File "/srv/global/scratch/groups/sbs/TopHat/2.1.1/tophat", line 2757, in compile_reports
    print >> run_log, " ".join(bamsort_cmd)
IOError: [Errno 28] No space left on device

Do you have any estimation of the temporary files TopHat2 would produce with that input? I'm trying to figure out what could take so much space, as our local work area has a capacity of 645 GB.

Thanks, Antoine

RNA-Seq TopHat2 memory usage • 3.1k views

ADD COMMENT • link 8.5 years ago by antoinefelden ▴ 60

0

Entering edit mode

What operating system are you running on? And how is it set up - are there different partitions? What command are you using to run TopHat - specifically where are you directing the output to? How quickly does it give the error - straight away or after a while?

645GB capacity with a 250GB input FASTQ sounds like its going to be tight - given SAM/BAM files will be created etc - is the 645GB completely free - or is there other data in there? i.e. how much of the 645GB is actually free before Tophat starts

ADD REPLY • link 8.5 years ago by Tonor ▴ 480

0

Entering edit mode

I work on my university HPC, on Linux. I'm running TopHat to align 15*2 read files (~8 GB each), with the --read-realign-edit-dist option which is known to increase computing time and maybe memory requirements as well? All the raw files are copied into the working space before the TopHat run starts, so effectively the free space in the local work area would be 645 - 250 = ~ 400 GB. The global scratch partition (shared by everyone I think) is 2TB, but the error occurred within the TopHat run (the last checkpoints were "Joining segment hits" and "Reporting output tracks"), not when copying the output so I'm guessing the problem is in the local work area.

ADD REPLY • link 8.5 years ago by antoinefelden ▴ 60

0

Entering edit mode

During its normal run tophat will create some rather large temp files, which will be written during things like "Reporting output tracks". Hopefully those aren't uncompressed, though tophat is old enough that I wouldn't be surprised. Can you use a newer/faster/better aligner instead? hisat2 and STAR are typically the go-to RNAseq aligners these days.

ADD REPLY • link 8.5 years ago by Devon Ryan 105k

0

Entering edit mode

I will use others but I wanted to compare outputs (namely with BBTools, I haven't looked at hisat2 or STAR). Since temporary files seem to be the issue here, is there any way to tell TopHat where to stick them somewhere else than the local work area?