TopHat2: "IOError: [Errno 28] No space left on device" with large input
0
0
Entering edit mode
8.0 years ago

I'm trying to run TopHat2 on my RNA-seq sample, but while trial job with a small subset of my samples worked fine, I can't run it with my full dataset. The trial run took around 30 GB as input, and everything went smooth, but when I tried with the full dataset of ~250 GB, then I got the following errors:

Traceback (most recent call last):
  File "/srv/global/scratch/groups/sbs/TopHat/2.1.1/tophat", line 4107, in <module>
    sys.exit(main())
  File "/srv/global/scratch/groups/sbs/TopHat/2.1.1/tophat", line 4081, in main
    params.gff_annotation)
  File "/srv/global/scratch/groups/sbs/TopHat/2.1.1/tophat", line 2757, in compile_reports
    print >> run_log, " ".join(bamsort_cmd)
IOError: [Errno 28] No space left on device

Do you have any estimation of the temporary files TopHat2 would produce with that input? I'm trying to figure out what could take so much space, as our local work area has a capacity of 645 GB.

Thanks, Antoine

RNA-Seq TopHat2 memory usage • 2.9k views
ADD COMMENT
0
Entering edit mode

What operating system are you running on? And how is it set up - are there different partitions? What command are you using to run TopHat - specifically where are you directing the output to? How quickly does it give the error - straight away or after a while?

645GB capacity with a 250GB input FASTQ sounds like its going to be tight - given SAM/BAM files will be created etc - is the 645GB completely free - or is there other data in there? i.e. how much of the 645GB is actually free before Tophat starts

ADD REPLY
0
Entering edit mode

I work on my university HPC, on Linux. I'm running TopHat to align 15*2 read files (~8 GB each), with the --read-realign-edit-dist option which is known to increase computing time and maybe memory requirements as well? All the raw files are copied into the working space before the TopHat run starts, so effectively the free space in the local work area would be 645 - 250 = ~ 400 GB. The global scratch partition (shared by everyone I think) is 2TB, but the error occurred within the TopHat run (the last checkpoints were "Joining segment hits" and "Reporting output tracks"), not when copying the output so I'm guessing the problem is in the local work area.

ADD REPLY
0
Entering edit mode

During its normal run tophat will create some rather large temp files, which will be written during things like "Reporting output tracks". Hopefully those aren't uncompressed, though tophat is old enough that I wouldn't be surprised. Can you use a newer/faster/better aligner instead? hisat2 and STAR are typically the go-to RNAseq aligners these days.

ADD REPLY
0
Entering edit mode

I will use others but I wanted to compare outputs (namely with BBTools, I haven't looked at hisat2 or STAR). Since temporary files seem to be the issue here, is there any way to tell TopHat where to stick them somewhere else than the local work area?

ADD REPLY
0
Entering edit mode

Not to my knowledge.

ADD REPLY

Login before adding your answer.

Traffic: 2480 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6