Hi all,
I'm having what seems to be a common problem with Tophat-fusion...no reported fusion results. There are a few threads discussing this but none seem relevant to my problem.
Looking For Reasons Of Why The Results Of A Tophat Fusion Post Is Empty
http://seqanswers.com/forums/showthread.php?t=17091&highlight=TopHat-Fusion
From the other threads, it's clear the structure of the directory is very important, specifically the samples must be labelled "tophat_[sample_name#1]" and the blast directory should be called blast not blast_human as indicated in the manual.
Looking through the code, I see these lines under the read_fusion_genes function:
for sample_name in sample_names:
sample_isoform_filename = "tophat_" + sample_name + "/transfuse.txt"
if not os.path.exists(sample_isoform_filename):
continue
So tophat-fusion-post seems to be expecting a "transfuse.txt" file in the sample directory. Here's my sample directory which I copied over from running Tophat2 --fusion-search ... :
-rw-rw-r-- 1 toms projects 332K May 14 19:06 insertions.bed
-rw-rw-r-- 1 toms projects 13M May 14 19:06 junctions.bed
-rw-rw-r-- 1 toms projects 142M May 14 19:06 unmapped.bam
-rw-rw-r-- 1 toms projects 569 May 14 19:06 align_summary.txt
-rw-rw-r-- 1 toms projects 6.0G May 14 19:07 accepted_hits.bam
-rw-rw-r-- 1 toms projects 184 May 14 19:07 prep_reads.info
-rw-rw-r-- 1 toms projects 353K May 14 19:07 deletions.bed
drwxrwsr-x 2 toms projects 894 May 14 19:07 logs/
-rw-rw-r-- 1 toms projects 48M May 14 19:07 fusions.out
No transfuse.txt file!
Has anyone run fusion-post successfully after Tophat2 (v. 2.0.13)? Can you show me the contents of your sample directory.
Thanks in advance.
Command:
tophat-fusion-post -p 2 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 bowtie_indexes/hg38
Output:
[Fri May 15 11:27:04 2015] Beginning TopHat-Fusion post-processing run (v2.0.13)
-----------------------------------------------
[Fri May 15 11:27:04 2015] Extracting 23-mer around fusions and mapping them using Bowtie
[Fri May 15 11:27:51 2015] Filtering fusions
Processing: tophat_sample/fusions.out
0 fusions are output in ./tophatfusion_out/potential_fusion.txt
[Fri May 15 11:27:57 2015] Blasting 50-mers around fusions
[Fri May 15 11:27:57 2015] Generating read distributions around fusions
[Fri May 15 11:27:57 2015] Reporting final fusion candidates in html format
num of fusions: 0
-----------------------------------------------
[Fri May 15 11:27:57 2015] Run complete [00:00:53 elapsed]
Directory:
drwxrwsr-x 3 toms projects 278 May 14 19:07 tophat_sample/
-rwxrwxr-x 1 toms projects 38M May 14 19:11 ensGene.txt*
-rwxrwxr-x 1 toms projects 7.2M May 14 19:12 ensGtp.txt*
-rwxrwxr-x 1 toms projects 398K May 14 19:12 mcl*
-rwxrwxr-x 1 toms projects 11M May 14 19:22 refGene_sorted.txt*
drwxrwsr-x 2 toms projects 1.2K May 14 20:35 blast/
drwxrwsr-x 2 toms projects 262 May 15 11:08 bowtie_indexes/
drwxrwsr-x 7 toms projects 359 May 15 11:16 tophatfusion_out/
For anyone who's interested, it looks like the problem stemmed from not having all the necessary blast databases in the blast directory - it was nothing to do with the "transfuse.txt" file. I'd failed to follow the instructions in the manual to include three blast databases:
http://ccb.jhu.edu/software/tophat/fusion_tutorial.html
As an aside, the tophat-fusion-post script could really do with a re-scripting to make it fail elegantly. There's a whole load of if:continue statements that can cause the script to output no fusions without the user receiving any error message to save a file is missing etc.