I am running RNA-seq analysis on a paired-end deep sequencing data set with no replicates. We are interested in finding novel gene and transcript isoforms in addition to variant info. Grooming and Tophat alignment went well and I've processed the .bam output through cufflinks in RABT mode with -GTF-guide
. I then take the .getf output from this and run cuffcompare with the reference .gtf and .fasta.
I am experiencing confusion related to the last step and was hoping that somebody with more experience than I could help to clarify a few things.
Firstly, most of the references I have read regading cuffcompare indicate that it is used for multiple replicates or experiments: "Used to Track Cufflinks transcripts across multiple experiments (e.g. across a time course)". Is it common to use cuffcompare on a single experiment in order to find novel isoforms?
Secondly, there are some entries in the output from cuffcompare that aren't making sense to me. What does it mean when I see an "=" class code with a zero FMI? How about a "j" class code with a FMI of 100? Based on the definition of FMI (fraction of major isoform), these scenarios don't seem possible.
Thirdly, if I want an fpkm score for a known gene, is it common to sum all transcript fpkms belonging to that gene with an "=" class code?
Thanks so much for any help, and let me know if I can/should provide more information!
-Jeremy