I had run tophat with following options:
tophat --bowtie1 -p 6 -o CT16 -r 150 --mate-std-dev 75 --no-discordant --no-mixed --transcriptome-index Transcriptome
and I get tophat_reports
running error:
[2014-05-27 15:47:24] Reporting output tracks
[FAILED]
Error running /usr/local/bin/tophat_reports
Another sample of the same study is not giving any problem. The difference between these two filesets is that the one that worked is relatively smaller ~6GB per pair compared to ~18GB per pair for the one that didn't work. Assuming that it could have been a runtime error I ran tophat resume (tophat -R CT16
) but still it gave the same error. I have seen other people also reporting the same error but the precise reason is not mentioned anywhere. I checked if it is because of lack of space but I have ~500GB free!! There are no RAM issues either. I am using tophat v2.0.11. Is this a bug? How do I fix it?
did you try with -p 1 ?
no I didnt.. It would be too slow then.. I tried it in a HPC and it still fails.. so I can confidently say that it is not a resource availability issue.
I tried with bowtie2 instead of bowtie1 and it still fails. Now it is seriously pissing me off. I didn't realize that tophat would be so damn irritating.
what is your RAM size? OS?
24GB RAM and 64bit Fedora 19 kernel 3.13.9-100
The normal cause of this is not enough RAM (this step can be a hog), however with 24 gigs that's somewhat unlikely. If you look in tophat's run log, you can see the exact command that it issued before crashing. As long as tophat kept all of the temp files then you should be able to directly run this yourself and then see what the actual underlying error is (yes, this is annoying). Depending on the organism you're using, you might consider switching to STAR, which is MUCH faster (24 gigs of memory might be enough, it'll depend on the genome size).
Could this be a solution?: split the file and run each splitfile on tophat and then stich the BAMs together
If you're not interested in finding novel junctions then yes, that'll work. If you do want to find novel junctions then the decrease in coverage will make that more difficult and the results wouldn't be as good if you split the input.
if tophat had an option to collapse the reads (while recording their counts) before aligning, this problem could have been partially avoided. Also, file size would be reduced.
Yeah, there are a number of annoyances with tophat's design. Give STAR a try. It'll make your life easier if you have enough RAM.
I was checking it but it doesn't predict novel junctions. I have to use the existing ones. I am hopeful that the 18GB file would let me find novel transcripts. Nonetheless, I'll run STAR in parallel.