Suggestion needed for finding fusion transcripts
1
0
Entering edit mode
4.9 years ago
c_u ▴ 520

Hi,

I have RNA-Seq data coming from human tissue samples, and I am interested in finding fusion transcripts specific to the disease case. There are a bunch of software out there that report fusion transcripts and I have come across the following two reviews (by the same group) that compare the different tools -

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5065767/

https://www.nature.com/articles/srep21597

The problem is that there seems to be no indication as to which is the best and many of the tools seem to perform very well on one dataset and then quite poorly in another. Also there is VERY low consensus/overlap in the output of the different tools.

So, I wanted to know if anyone who has worked on this has any idea for a good tool for this purpose or any general insight about fusion transcript detection.

EDIT - There is a newer review (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1842-9) that claims STAR-Fusion, Arriba and STAR-SEQR to be the top 3 performers.

RNA-Seq fusion-transcripts • 1.7k views
ADD COMMENT
2
Entering edit mode

Also there is VERY low consensus/overlap in the output of the different tools.

<sarcasm> Welcome to the beautiful world of bioinformatics. </sarcasm> But seriously, this is a common problem in bioinformatics. The big problem is that datasets are complex and there is no gold standard benchmark that can capture all edge cases one might encounter in different datasets. Different software may produce very different results depending on the dataset and its quality. If you can go and try different software, collect promising candidates and validate in the lab. STAR-Fusion is often used from what I know (https://github.com/STAR-Fusion/STAR-Fusion/wiki) but I have no hands-on experience.

ADD REPLY
0
Entering edit mode

thanks for the comment and for STAR-fusion

ADD REPLY
4
Entering edit mode
4.8 years ago
Amitm ★ 2.3k

Hi, I agree to prev. comment from ATpoint. Nonetheless, I have used STAR-Fusion and here are some points to keep in mind -

  1. You should have good sequencing depth in your RNA-seq library to confidently detect fusions. As a rough estimate, 60M reads or more of 100x2 PE data is a good starting point. In case your data is (ribo-depleted) total RNA (and not poly-A selected), then much larger library size should be needed.
  2. Complement the tool you have selected, like STAR-Fusion, with another tool that uses a different strategy. Like JAFFA which can use a hybrid approach of mapping plus assembly. Or, like Pizzly which is based on pseudo-alignment.
  3. If you have a fusion candidate that has good read-depth support, there is fair chance of it being picked up by other tools as well. But this is not granted.
  4. That brings to the last point: have a hypothesis when looking at the results. If gene X has fusion detected and you suspect that fusion is 'activating' the gene, then you expect the main protein domain (of gene X) to have been retained in the fusion. Also, if the fusion junction is using known splice-site(s), there is better chance of it being biologically relevant.
ADD COMMENT
0
Entering edit mode

thanks a lot for that reply!!! Could you add it as an answer, I could accept it. I had a couple questions about it too

ADD REPLY
0
Entering edit mode

Hey Amit, Thanks for the answer. I have single-end data with some samples of 50 bp long reads (~65M reads per file) and others are 75bp long reads (~75M reads per file). Both of these are total RNA-seq. I will certainly complement STAR based fusion catching methods with methods having a different approach. But for this data set, do you think if its meaningful to try to find fusion transcripts?

ADD REPLY
1
Entering edit mode

Hi, Those are not bad lib sizes. You could surely give it a go. Though, having SE instead of paired-end reads would make it a bit more difficult for any algorithm to detect candidates.

ADD REPLY
0
Entering edit mode

Thanks a lot, will give it a try!

ADD REPLY

Login before adding your answer.

Traffic: 1444 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6