Question

Gene Fusion Detection: Rna-Seq Data

13

Entering edit mode

12.4 years ago

KS ▴ 380

Hello everyone

I am trying to analyze RNA-Seq data. I am a beginner in this process and trying to learn software's used for analyzing RNA-Seq data. I have 10 tumor cancer samples with matching normal samples and I need to find gene fusions in these samples as a part of my school exercise. Could any one please suggest any process of how to proceed?

Thanks

rna-seq next-gen sequencing fusion • 46k views

ADD COMMENT • link updated 2.1 years ago by Ram 44k • written 12.4 years ago by KS ▴ 380

0

Entering edit mode

A third vote for TopHat-Fusion, I recently used to see fusion in brain cancer samples, it predicts a lots of possible fusion (you can filter the results by coverage and a secondary analysis for mapping specificity with blat).

ADD REPLY • link 12.4 years ago by JC 13k

0

Entering edit mode

Is TopHat fusion included in Galaxy Server??

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 12.4 years ago by KS ▴ 380

2

Entering edit mode

This should probably be asked as a comment on one of the existing answers suggesting tophat-fusion or, since multiple answers suggested tophat-fusion, as a comment to your original question. But, definitely not as an answer to your question.

ADD REPLY • link 12.4 years ago by Obi Griffith 20k

0

Entering edit mode

have you ever used SOAPfuse,can you tell me how to use SOAPfuse?

the details of command.

ADD REPLY • link 9.4 years ago by syxbestmayer ▴ 20

0

Entering edit mode

Hello everyone,

I had a concern about the outputs from 2 gene fusion calling tools that I was able to get to work. I ran Tophar-fusion and STAR-fusion on one of my RNA-Seq samples and the output gene fusion list doesn't match at all.

Has anyone else faced a similar issue before ? Any thoughts on why this could be the case would be really appreciated.

Thanks a lot.

ADD REPLY • link 8.0 years ago by aditisk • 0

0

Entering edit mode

Please provide more info:

How the output was compared? Have you checked if the output is matched when allowing +/- shift in breakpoint coordinates? Are there any matches if fusion list is collapsed to gene pairs?
What is the average number of fusions you've got with each tool? What about the coverage: number of reads spanning the junction itself, etc?
Are you sure that there are any fusions in your sample? :)

ADD REPLY • link 8.0 years ago by mikhail.shugay 3.5k

0

Entering edit mode

Actually, Tophat-fusion and STAR-fusion need a lot of improvement. Tophat-fusion consistently in is the bottom of all comparisons of fusion finders. TopHat-fusion calls hundreds of thousands of fusions per sample when it is well known that fusions are very rare and one has one fusion for every 10 samples analyzed. Therefore is it is expected that the lists of fusions from TopHat-fusion and STAR-fusion do not match at all.

ADD REPLY • link 7.5 years ago by enxxx23 ▴ 280

Malachi Griffith · Answer 1 · 2015-09-02

Tools capable of detecting fusion genes:

Most of these use RNA-seq data, some use WGS data, and some use both. They are listed alphabetically. I will add to the list when I discover more.

Other useful programs:

Chimeraviz: https://bioconductor.org/packages/devel/bioc/html/chimeraviz.html (disclaimer: I created this)

Chimera: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4253834/
OncoFuse: http://bioinformatics.oxfordjournals.org/content/29/20/2539.long
FuMa: http://bioinformatics.oxfordjournals.org/content/early/2015/12/09/bioinformatics.btv721.abstract

Articles comparing gene fusion finders:

The structure of state of art gene fusion-finder algorithms
Application of next generation sequencing to human gene fusion detection: computational tools, features and perspectives
Comprehensive evaluation of fusion transcript detection algorithms and a meta-caller to combine top performing methods in paired-end RNA-seq data

score 10 · Answer 2 · 2012-05-30

10

Entering edit mode

12.4 years ago

Obi Griffith 20k

You could try Tophat-Fusion. But, if you are really just getting started with RNA-Seq analysis I would start with simple expression level and differential expression analysis. To do that, you could start by installing and learning how to use the Tuxedo suite of software (Bowtie, Tophat, cufflinks, cuffdiff, CummeRbund). Once you have mastered those you can proceed to the slightly more advanced tophat-fusion (which now comes together with tophat2). They provide a tutorial on their website.

I strongly recommend a workshop/course on the subject. However, to get you started, why not work through last year's Canadian Bioinformatics Workshop (CBW) tutorial on RNA Sequence Analysis. You can find it on the 2011 course page under the "Informatics on High Throughput Sequencing Data" course. There are probably several other lectures there that will be helpful as well.

ADD COMMENT • link 12.4 years ago by Obi Griffith 20k

1

Entering edit mode

@Griffith: link to "tutorial on their website" does not work.

ADD REPLY • link 12.4 years ago by Dataminer ★ 2.8k

0

Entering edit mode

Thanks for noticing that. Fixed.

ADD REPLY • link 12.4 years ago by Obi Griffith 20k

1

Entering edit mode

Let me put my 5 cents.. actually I've been using Tophat-fusion (then Tophat2) for most of the RNA-seq analysis. I think this is a gold standard in the field. Unfortunately for me (and others, see e.g. http://seqanswers.com/forums/showthread.php?t=13096) tophat-fusion-post is really hard to get working. Tophat-fusion produces numerous false-positives (most of fusions are coming from the same transcript), as well as a lot of read-through events. I would like to recommend our post-filtering pipeline http://bioinformatics.oxfordjournals.org/content/early/2013/08/24/bioinformatics.btt445, hope you'll find it useful

ADD REPLY • link 11.2 years ago by mikhail.shugay 3.5k

1

Entering edit mode

Not gold-standard, but very popular.

ADD REPLY • link 11.2 years ago by Sean Davis 27k

1

Entering edit mode

Agreed. I don't think anything in the current crop can be considered a gold standard for fusion detection.

ADD REPLY • link 11.2 years ago by Obi Griffith 20k

0

Entering edit mode

Thanks! These filters should really be a part of tophat-fusion. Especially the gene-to-gene filter, but also the in-frame filter.

ADD REPLY • link 11.1 years ago by Danielk ▴ 640

0

Entering edit mode

It is important to note that the in-frame filter remains a complex question. Based on analysis of available manually mapped fusion junctions in literature [http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0004805] there could be indels that could repair/break fusion frame thus having a very high impact on presence and function of fusion protein. I'm not sure if currently available fusion detection software could effectively handle this issue (correct me if I'm wrong).

ADD REPLY • link 11.1 years ago by mikhail.shugay 3.5k

0

Entering edit mode

I have a problem with RNA-seq. I run tophat2-Fusion and got results.html and in that file there are 11 candidate. Now I want to validate. how can I choose fusion candidate for validation ?

ADD REPLY • link 11.6 years ago by charitrakumarmishra • 0

2

Entering edit mode

@charitrakumarmishra :Ask it as a seperate Question!!!!

ADD REPLY • link 11.6 years ago by Rm 8.3k

Malachi Griffith · Answer 3 · 2012-06-01

5

Entering edit mode

12.4 years ago

Nicolas Rosewick 11k

Here's a list of fusion gene detection tools working with RNA-seq data

ADD COMMENT • link updated 12.4 years ago by Malachi Griffith 20k • written 12.4 years ago by Nicolas Rosewick 11k

Ram · Answer 4 · 2012-05-30

3

Entering edit mode

12.4 years ago

David Langenberger 11k

You could take the RNAseq datasets and map them against a reference genome using mapping tools that can handle split reads. Two tools, I can recommend are segemehl, and TopHat Fusion.

They use reads that overlap with splice sites and appear to be cleaved when mapping them back to a genome (one side to one exon, the other side to the next exon and intron in between), to predict splice sites and/or fusion transcripts. The difference is basically, that when you have a splice site, both ends are somewhat 'close' to each other. In a fusion transcript, these splits are far away, on different strands, or even on different chromosomes.

How to use segemehl for your problem (on the segemehl link, every step is explained in more detail):

create a index for you genome (you have to do it only once)
run segemehl
run haarz, which will call the splice junctions

ADD COMMENT • link 12.4 years ago by David Langenberger 11k

2

Entering edit mode

segemehl is NOT a fusion gene finder! segemehl is a simple aligner exactly like Bowtie2, BWA, START, etc. This is stated by the segemehl people here: http://seqanswers.com/forums/showthread.php?t=40765&highlight=segemehl

Finding splice junctions is different than finding fusion genes!

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 10.5 years ago by enxxx23 ▴ 280

1

Entering edit mode

I did not claim that segemehl is a 'fusion-gene-finder' per se, did I? I just explained, how you can use segemehl together with haarz to predict 'splice junctions', which can be fusion-junctions. :)

ADD REPLY • link updated 4.8 years ago by Ram 44k • written 10.5 years ago by David Langenberger 11k

0

Entering edit mode

The question is asking for fusion finder! TopHat-Fusion is in the same sentence with SEGEMHL in your answer when TopHat-Fusion is a fusion gene finder and SEGEMHL is not!

Also, the three steps from above are a method for finding splice junctions. Please note that finding splice junctions does not mean that one is finding fusion genes! Most of the things ( that is 98%) what will be found with those 3 steps will be readthroughs and readthroughs!

ADD REPLY • link 10.5 years ago by enxxx23 ▴ 280

1

Entering edit mode

I see your point. Sorry for being somewhat off-topic then. I did not want to send anyone down the wrong track, nor upset you. Thanks for clarifying the difference of "fusion-junction" and "fusion-gene", of course it is something different and I should have been much more detailed in the first place. Just wanted to help though. :) Next time more precise! :)

ADD REPLY • link 10.5 years ago by David Langenberger 11k

0

Entering edit mode

I see your point also and I think that this confusion comes from the authors of SEGEMEHL which state in their paper that SEGEMEHL identifies readily fusion transcripts without need of separate post-processing.

Here is the paper:

Hoffmann et al. A multi-split mapping algorithm for circular RNA, splicing, trans-splicing, and FUSION DETECTION, Genome Biol. 2014. http://www.ncbi.nlm.nih.gov/pubmed/24512684

and here is the quote from the above article:

Implemented in the segemehl mapping tool, it readily identifies conventional splice junctions, collinear and non-collinear fusion transcripts, and trans-spliced RNAs, without the need for separate post-processing...

ADD REPLY • link updated 4.8 years ago by Ram 44k • written 10.5 years ago by enxxx23 ▴ 280

1

Entering edit mode

I believe that overall gene fusions with both breakpoints in introns will be simply the most frequent class of fusions. So if a software could be sensitive enough to spot a junction between exons of different genes and report it - it is capable of fusion detection

ADD REPLY • link 10.5 years ago by mikhail.shugay 3.5k

0

Entering edit mode

According to this definition then BLAST/BLAT/BOWTIE/BOWTIE2/BWA fusion finders too!

Out there are only known few hundreds of fusion genes known all together for all cancers! See for Mitelman database of gene fusions and COSMIC Catalog of somatic mutations in cancer!

If one just runs straight any aligner on one sample will "find" discover thousands of candidate fusion genes which makes one wonder how is this possible that one sample has more "gene fusions" than all known validated fusion genes in all cancers (which come from thousands of samples)?

ADD REPLY • link updated 4.8 years ago by Ram 44k • written 10.5 years ago by enxxx23 ▴ 280

0

Entering edit mode

So, you wanted to use it for fusion-gene detection, but it didn't work and now you are upset? I'm really sorry, if our statements confused you and you wasted your time. Perhaps we can help you running segemehl and haarz to get some fusion-junctions and downstream some fusion-genes out of it?

ADD REPLY • link updated 4.8 years ago by Ram 44k • written 10.5 years ago by David Langenberger 11k

0

Entering edit mode

Also a note for the rest of interested people:

If any of you is interested in learning how to use segemehl to detect fusion transcripts and/or circularized RNAs, I can recommend you the following hands-on course:

Discovering standard and non-standard RNA transcripts - How to detect canonical splicing, circular RNAs, trans-splicing, and fusion transcripts

Developers of the algorithm will explain you step-by-step how you can use segemehl to detect standard and non-standard transcripts. They will assure that all of you understand the difference between 'fusion-junctions' and 'fusion-genes' and what exactly you can do with segemehl and all its downstream analysis tools like (lack or haarz).

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 10.5 years ago by David Langenberger 11k

0

Entering edit mode

If you wish I may give a lecture/presentation at your course about finding fusion genes and fusion genes finders where a clear and easy to understand explanation will be given about:

what fusion genes really are,
what somatic fusion genes really are,
how many validated fusion genes are known today in the scientific literature,
what is the difference between conjoined genes and somatic fusion genes,
what is the difference between germline fusion gene and somatic fusion gene,
what is the difference between alternative splicing and fusion gene,
why fusion genes are more interesting than readthroughs,
how the validation in the wet-lab is done for fusion genes, and
why is important to do wet-lab validation of the bioinformatic predictions (for example ENCODE has found bioinformatically that over 80% of human genome is functional in 2012 and the biologists proved that claim very wrong; see: D.Graur "On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE", 2013, http://gbe.oxfordjournals.org/content/early/2013/02/20/gbe.evt028.short?rss=1 )

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 10.5 years ago by enxxx23 ▴ 280

0

Entering edit mode

Giving a lecture is actually a great idea! Unfortunately, the workshop is already prepared, announced and there are no free slots left. Nevertheless, I will ask around at the bioinformatics group of the University of Leipzig and I'm pretty sure there are more than enough interested people for a talk. Thanks for your offer! I will come back to you and then we can discuss about a date. Can you send me your contact information? (Is there a possibility for private messages here? If not, send your contact info to david.langenberger@ecseq.com)

ADD REPLY • link 10.5 years ago by David Langenberger 11k

0

Entering edit mode

Hi enxx23,

I have a great interest in this lecture, can you share the ppt with me?

ADD REPLY • link 8.8 years ago by Lilian • 0

1

Entering edit mode

Most of things identified by TopHat-Fusion will also be readthroughs. Thats why usually some explicit readthrough filtering is added to software. And by the way those readthroughs will be as real as gene fusions, with mRNA molecules being there in cells.

ADD REPLY • link 10.5 years ago by mikhail.shugay 3.5k

0

Entering edit mode

Again, out there are known only a few hundred of gene fusions in all cancers! Tophat-fusion is finding ~130 000 candidate fusion genes in 4 samples (from: http://www.hindawi.com/journals/bmri/2013/340620/ )!!!! That is over 1000 times more than all gene fusion known in all cancers!!! For sure 99.99% of those 130 000 fusions do not exist and are just false positives! There are fusion finders which offer the full package and require no additional filtering!

ADD REPLY • link 10.5 years ago by enxxx23 ▴ 280

score 3 · Answer 5 · 2012-06-01

In addition to splice junction discovery, MapSplice is capable of detecting gene fusions in RNA-seq data. From their website:

MapSplice is an algorithm for mapping RNA-seq data to reference genome for splice junction discovery. Features of MapSplice include:

alignment of both short reads < 75bp and long reads >= 75bp.
both CPU and memory efficiency.
detection of small exons.
discovery of canonical, semi-canonical and non-canonical junctions.
splice inference based on the alignment quality and diversity of reads mapped to a junction.
identification of chimeric events (intra-chromosomes and inter-chromosomes, inter-strands) with long reads.
identification of chimeric events (intra-chromosomes and inter-chromosomes, inter-strands) with short paired-end reads.
support paired-end reads and single-end reads

Ram · Answer 6 · 2013-08-26

FusionCatcher is another option.

FusionCatcher searches for novel/known fusion genes, translocations, and chimeras in RNA-seq data (paired-end reads from Illumina NGS platforms like Solexa and HiSeq) from diseased samples. The aims of FusionCatcher are: very good detection rate for finding candidate fusion genes, very easy to use (i.e. no a priori knowledge of databases and bioinformatics is needed in order to run FusionCatcher), to be as automatic as possible (i.e. the FusionCatcher will choose automatically the best parameters in order to find candidate fusion genes, e.g. finding automatically the adapters, building the exon-exon junctions automatically based on the length of the input reads, etc.) while providing the best possible detection rate for finding fusion genes

Current citation for FusionCatcher:

D. Nicorici, M. Satalan, H. Edgren, S. Kangaspeska, A. Murumagi, O. Kallioniemi, S. Virtanen, O. Kilkku, FusionCatcher - a tool for finding somatic fusion genes in paired-end RNA-sequencing data, bioRxiv, Nov. 2014, DOI:10.1101/011650

Malachi Griffith · Answer 7 · 2012-05-30

2

Entering edit mode

12.4 years ago

Sean Davis 27k

You can take a look at using GSNAP for this purpose, also.

ADD COMMENT • link updated 12.4 years ago by Malachi Griffith 20k • written 12.4 years ago by Sean Davis 27k

score 2 · Answer 8 · 2012-06-01

2

Entering edit mode

12.4 years ago

Malachi Griffith 20k

Trans-ABySS has been used to successfully identify gene-fusions from RNA-seq data by the group that developed it (full disclosure, I used to belong to that group).

ADD COMMENT • link 12.4 years ago by Malachi Griffith 20k

score 2 · Answer 9 · 2016-10-12

2

Entering edit mode

8.1 years ago

Ron ★ 1.2k

Here is a Fusion Detection tool for supervised analysis.You have to provide your list of fusion genes to check if present. https://github.com/FusionInspector/FusionInspector/wiki Also,it gives the bam files that show fusion support.

ADD COMMENT • link 8.1 years ago by Ron ★ 1.2k

score 1 · Answer 10 · 2012-06-01

1

Entering edit mode

12.4 years ago

Malachi Griffith 20k

BreakFusion is another option that was recently reported and released.

ADD COMMENT • link 12.4 years ago by Malachi Griffith 20k

Ram · Answer 11 · 2012-08-01

1

Entering edit mode

12.3 years ago

Rm 8.3k

For Gene-fusion detection you can also try SnowShoes-FTD:

Article describing it: http://nar.oxfordjournals.org/content/early/2011/05/27/nar.gkr362.full

ADD COMMENT • link updated 4.9 years ago by Ram 44k • written 12.3 years ago by Rm 8.3k

score 1 · Answer 12 · 2014-01-25

1

Entering edit mode

10.8 years ago

Malachi Griffith 20k

Here are some reviews of RNA fusion detection tools:

http://www.hindawi.com/journals/bmri/2013/340620/
http://www.oapublishinglondon.com/article/617
http://www.nature.com/articles/srep21597

ADD COMMENT • link 7.7 years ago by Malachi Griffith 20k

Ram · Answer 13 · 2014-05-18

1

Entering edit mode

10.5 years ago

enxxx23 ▴ 280

Here is a comparison of several Fusion Genes Finders:

https://code.google.com/p/fusioncatcher/wiki/comparison

ADD COMMENT • link 10.5 years ago by enxxx23 ▴ 280

0

Entering edit mode

Still this only provides some clues on good recall of fusioncatcher. Its not very clear what the precision is (afaik fusioncatcher paper is not yet available in e-pub). So I believe the choice of fusion detection software has to be done via trial and error :)

ADD REPLY • link 10.5 years ago by mikhail.shugay 3.5k

0

Entering edit mode

Precision and FDR is presented also there!

SOAPfuse has that kind of statistics. http://genomebiology.com/2013/14/2/R12

If this helps you we are using deFuse. ;-)

ADD REPLY • link updated 4.8 years ago by Ram 44k • written 10.5 years ago by enxxx23 ▴ 280

score 1 · Answer 14 · 2018-12-24

1

Entering edit mode

5.9 years ago

Shixiang ▴ 100

You can take a look at https://www.nature.com/articles/srep21597

ADD COMMENT • link 5.9 years ago by Shixiang ▴ 100

score 0 · Answer 15 · 2013-08-26

0

Entering edit mode

11.2 years ago

Rm 8.3k

chimera bioconductor package an use rnaseq-STAR outputs too

ADD COMMENT • link 11.2 years ago by Rm 8.3k

score 0 · Answer 16 · 2014-01-25

0

Entering edit mode

10.8 years ago

Malachi Griffith 20k

Genomon Fusion is yet another option.

ADD COMMENT • link 10.8 years ago by Malachi Griffith 20k

score 0 · Answer 17 · 2014-01-25

SOAPfusion

"A novel tool for fusion discovery with paired-end RNA-Seq reads. The tool follows a different strategy by “finding fusions directly and verifying them”, differentiating it from all other existing tools by “finding the candidate regions and searching for the fusions afterwards”. This enables the fusion discovery process to be more effective and sensitive, also with a specular performance under low coverage of sequencing far more better than other tools."

Not to be confused with:

SOAPfuse

"An open source tool developed for genome-wide detection of fusion transcripts from paired-end RNA-Seq data. By comparing with previously released tools, SOAPfuse has a good performance. It is developed in perl. So far, it is developed only for analysis on human being RNA-Seq data.