Does anybody know of how to tell from tophat/cufflinks output that whether or not a transcript is fused or not compared to gff annotation (reference annotation)?
After looking at the tophat bam file and transcript.gtf along with gff (reference) file on IGV i found that some of the annotated genes are fused and some are not fused (i.e a single gene in transcript.gtf is reported as two genes in reference gff and sometimes a fused gene (2 genes) in transcript.gtf is reported as single gene in reference gff). All i want to know is how many of these discrepencies exist in reference annotation (gff) compared to cufflink transcripts.
Any ideas?
Have you tried running tophat fusion to answer the question more directly?
No i haven't. That will be my next plan to do if i can't get to use my current tophat/cufflinks output.
are you using -g or -G option? My guess is -G will avoid that kind of problems.
I haven't used either -g or -G with cufflinks because all i wanted to know is novel genes and transcripts.
you can use -g, Cufflinks will report the known models and novel isoforms/transcripts
Yes i just checked that one. I always thought -g would not produce novel isoforms/transcripts. But how does it solve my problem of detecting fusion genes in my RNAseq data compared to reference annotation?