Hello, I'm currently analyzing rna-seq data to detect fusion transcripts. For this I'm trying to use all the tools available and test them to compare the performance and the results. The fact is, my data aren't perfect for fusion detection : single-end ~ 70bp, I have not sequenced this but I'm obliged to work with it.
So, I've tested 3 tools at the moment :
- Arriba (https://github.com/suhrig/arriba)
- Fusioncatcher (https://github.com/ndaniel/fusioncatcher)
- Star-Fusion (https://github.com/STAR-Fusion/STAR-Fusion/wiki)
I'm looking to test other tools (tophat fusion, gfusion, fusionmap). But the results are already very different. Here is a venn diagram to get a better vision of my results (entire dataset ~ 50 samples ) : http://zupimages.net/viewer.php?id=19/23/vecs.png
The results for each sequence are very different, few transcripts are kept between the tools but not on all dataset, for me, it's not sufficient to interprate it correctly.
I've tested the 3 tools on a control sequence which contains 17 known fusion transcripts (i got it on github of fusioncatcher, it was a paired-end, i concatenate the two reads into one to simulate a single-end. The results between the 3 tools are quite similar : http://zupimages.net/viewer.php?id=19/23/m0vg.png
Maybe the results are biaised for this control sequence because it have been created by hand and like it contains known fusion transcripts, the tools are more accurate I think.
I would like to get your point of view and advices for my situation, if you have already experienced fusion detection what would you do ? At the moment I want to get the more results as possible and keep only fusion transcripts which are found in two tools or more.
Thank you in advance.
Thank you for your answer, I'm at ease to know that It's not that simple to get good results at first time. I take a look your publication and It's very interessant, with all these common fusions transcripts it have been the the best way to ensure the results to be pertinent. In my case I will try to adjust parameters to detect more fusion, if I get suffisant number I will try to build something similar to your pipeline to interprate my data correctly. Your methodology will help me for sure :) !
You're welcome! When you compare detected fusions from different tools you have to be careful to set a window for crossing fusion positions on the genome. (For a same fusion, the positions predicted can vary of several bases across tools)
Well, I will remember this information thank you very much!