I understand that Cufflinks can report FPKM on novel transcripts (different isoforms) if using proper parameters, but I wonder whether cufflinks is able to report FPKM on chimeric transcripts/gene fusions between different genes? for example, if a fusion happens between exon1 of gene1 and exon5-6 of gene2, would cufflinks be able to report an isoform (the fusion) in some format as below (in the transcripts.gtf):
chr_gene1 gene1_exon1_start gene1_exon1_end gene1_exon1_fpkm
chr_gene2 gene2_exon5_start gene2_exon5_end gene2_exon5_fpkm
chr_gene2 gene2_exon6_start gene2_exon6_end gene2_exon6_fpkm
Based on a test case I ran, I don't see cufflinks reporting the fusion transcripts..., but I wonder whether it's because I was not using the right parameters or cufflinks never does that.. Please advise.
Thanks!
Can anyone please help with this question?
Thanks a lot!
You can use STAR-fusion or other fusion softwares for finding fusion genes.
Hi Ron, thanks for the input. We do use STAR for calling fusions, but we also want to find out the coverage/FPKM at the fusion breakends. Since Cufflinks is the tool we use in our pipeline for calculating FPKMs, we wonder whether Cufflinks can report the fusion transcripts (we thought that could be counted as a format of isoforms). If yes, we thought it could provide a direct and easy way to get the coverage information at breakends.
According to this link, yes. But aligner such as
Tophat-fusion
, should provide you the results directly for validation. This is an example in theTophat-fusion
manual. Don't know about other aligners.As per Satyajeet, TopHat-fusion outputs the number of reads supporting a fusion breakpoint. I cannot see how Cufflinks would do this as part of its pipeline. Be aware too that TopHat / Cufflinks have been replaced by HISAT / StringTie.
If you have raw RNA-seq data, then a useful thing to do would be to obtain the FASTA sequence of your fusion gene and then determine read count abundance over this and all other transcripts' FASTA sequences using something like Salmon or Kallisto, and then do a differential expression analysis.