Fusion gene detection software
2
5
Entering edit mode
8.9 years ago
ninninahm ▴ 70

Hi all,

I want to test several fusion gene detection tools, but am running into a lot of problems only installing them.

I wanted to start this post here to see if other people had similar problems and if we can help each other out. Maybe some of you try to do the same. Any suggestions, ideas, help or critiques are highly welcome!

The tools I'm testing so far are:

  • FusionCatcher - easy to install and easy to use, offers a lot of information. Even tests versus virus genomes. Nice tool! Uses different aligners and combines the output. Well documented.
  • JAFFA - I got it to run on my data, once the java path was set, it was very easy and straightforward to use
  • SoapFuse - Once the folder structure is set, I wrote a shell script for this, it runs. For one week now O_o
  • Mapsplice2 - Problems with bowtie index
  • FusionMap - Im having problems to prepare the data base, because our proxy won't allow the direct download and the manual downloaded files are not found by the tool...
  • EricScript - easy to obtain and to install. downloads everything automatically, however, again when the proxy is blocked you basically cannot use it.
  • Trinity - works and is relatively easy to install.With some hiccups, I got it to run.
  • snowshoe FTD - asked via mail for the tool, no answer so far
  • deFuse - a bit complicated to set up
  • fusionQ -
  • star-fusion - easy to use and to install.
  • tophat-fusion -
  • ChimeraScan -
  • Bellerophontes -

I'm going to update the list as I go along.

Best
Ninni

fusion-genes • 5.1k views
ADD COMMENT
0
Entering edit mode

+1 for Fusioncatcher, it's worked well in my hands previously. No tophat-fusion?

ADD REPLY
0
Entering edit mode

I have not tried tophat-fusion yet, because I heard that it is not good and very slow. Do you have any experience with it? I will also add it to the list!

Thanks

ADD REPLY
1
Entering edit mode
8.9 years ago
Amitm ★ 2.3k

hi ninninahm,

I have jostled with fusion detection software for quite some time now. For long, I have used SoapFuse. About installation part, most of them including SoapFuse, were mixed bag of different tools and hence installation was a pain. Like SoapFuse uses BLAT internally and hence I guess takes ages to process a good size RNA-seq data. In my hands, with 8 processors, a 60-70million human polyA+ve library runs for at least 12hours.

But I have kept using SoapFuse as time and again publications have reported its reliable sensitivity. A latest pub. being this one - https://nar.oxfordjournals.org/content/early/2015/11/17/nar.gkv1234.abstract

Recently, I tried STAR-fusion. What I liked about this tool is that, apart from downloading the custom anno. dataset, the tool itself is not a mixed bag of different tools. STAR itself gives the output of chimera detetcted and this wrapper based on the CTAT resource, does filtering for candidates.

As, STAR itself is very fast, the advantage of using STAR-fusion is that one can get to fusion candidates in short time, unlike using SoapFuse. What was more promising was that, since I had been running SoapFuse for long, I compared STAR-fusion's result and it was able to detect the recurrent fusion candidate we had seen in earlier samples in a longitudinal follow-up of patient sample.

ADD COMMENT
0
Entering edit mode

Thank you! I wanted to use SOAPFuse, BUT the input structure seems very complicated if you have more than 200 samples :/ Do you maybe have a hint at how to do this best? Im using this documentation here: http://soap.genomics.org.cn/soapfuse.html#prepare2

If I understand correctly, I have to make a folder for each sample/library/fastq file (pair). That seems a bit cumbersome for more than 200 samples.

So far I can recommend fusioncatcher, it is easy to use, offers different aligners (such as star), tests versus virus genomes and also bodymap as control, hence giving you the gene fusions only specific in your data. (If I understand correctly). Maybe you want to try it as well and see if you find your fusions with fusioncatcher as well? https://github.com/ndaniel/fusioncatcher

It takes a long time, because it tests so many different things and uses different aligeners as well (you can tell it to change that). If you have case and control data, you can even specify that.

I am adding star-fuse to the list!

Best,
Ninni

ADD REPLY
0
Entering edit mode

Hi,

Yes, you are spot on about the trouble with SoapFuse. So far I would have processed only ~20 samples with SoapFuse and I created the arcane dir. structure as suggested for each of them. Sorry I don't have a way out to suggest you.

Fusioncatcher seems interesting. Will give it a try. Thanks for the suggestion.

ADD REPLY
2
Entering edit mode

Hello!

I am facing the same troubles as you are with fusion softwares: their installation is really painful for beginners!

For SOAPfuse I used it on 80 samples without any difficulty using this type of array bash script:

  1. Create a file containing in one column your sample names. One line per sample.
  2. Use this information within a bash array script (#SBATCH --array=1-xxx) or a loop.

    names=`sed -n "$SLURM_ARRAY_TASK_ID"p $files | awk '{print $3}'` #Third column of file contains a name for each sample.
    

    to create what SOAPfuse requires to work:

    # Output dir
    mkdir Sample/SOAP/"$names"
    
    # Input dir
    mkdir /path/to/WHOLE_SEQ_DATA_DIR/"$names"/A/
    
    # Txt file with the information about the sample that will be created in the output dir
    touch Sample/SOAP/"$names"/"$names".txt
    echo "$names"$'\t'A$'\t'"$names"$'\t'101 > Sample/SOAP/"$names"/"$names".txt
    
  3. After that you can just move your fastq samples to the input directory (like with another argument in the file. Ex: "$file" /path/to/file) and run SOAPfuse

    perl SOAPfuse-RUN.pl -c config/config.txt -fd /path/to/WHOLE_SEQ_DATA_DIR/ -l /path/to/Sample/SOAP/"$names"/"$names".txt -o /path/to/SOAP/"$names"/
    

​For me this did the trick to avoid playing with their weird architecture. ​ In your list you have forgotten chimeraScan which is quite easy to use. It is old but according to some recent papers still pretty robust.

ADD REPLY
0
Entering edit mode

Hey Radek!

This is great! Thank you so much! Do you also know, if I can use SoapFuse with hg38? As far as I understand, I need to prepare something for this.

I will add ChimeraScan, thanks for the pointer!

Ninni

ADD REPLY
0
Entering edit mode

Sorry I'm currently using hg19 so I did not look at the code of SOAPfuse-S00-Generate_SOAPfuse_database.pl.

If I had to use another dataset for SOAPfuse I would first have a look to this perl script for arguments that should be adapted between hg19/hg38 like hardcoded names, way to handle duplicated names in the repeated regions of the gtf file (see comment from the author here) or any variations of structure between hg19's gtf and hg38's gtf files. If you don't find anything weird in it it should be feasible to use hg38.

ADD REPLY
0
Entering edit mode

Thank you so much! I will try this!

ADD REPLY
0
Entering edit mode
8.9 years ago

I've had really good success with INTEGRATE in cases where you have both RNAseq and WGS:

http://genome.cshlp.org/content/early/2015/11/10/gr.186114.114?top=1

http://sourceforge.net/projects/integrate-fusion/

ADD COMMENT
0
Entering edit mode

Thank you for the hint, but I only have paired end RNAseq data.

Best Ninni

ADD REPLY

Login before adding your answer.

Traffic: 2318 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6