Experimental background: Mice underwent three drug treatment conditions, with three biological replicates per condition. 100 bp paired-end RNA-seq was carried out on a specific brain region and cell type.
I used STAR to map with recommended settings for usage of the output with cufflinks. I ultimately ran cuffdiff on the .sam files, filtering non-coding RNA's using an ensembl .gtf to mask them, which returned results for differential gene and isoforms testing (genes_exp.diff, isoform_exp.diff). I quantified the number of annotated transcripts per gene using the isoforms fpkm_tracking file and looked to see how many of the genes with differentially expressed isoforms only had one annotated transcript that could have been tested. Surprisingly, a substantial number of significantly (q_value < 0.05) differentially expressed isoforms only had one annotated transcript, 100% of which were also identified as differentially expressed at the gene level.
Thus, my question is why cuffdiff would test transcripts originating from genes that only have one annotated transcript when this would almost surely come back as differentially expressed at the gene level, as my results would indicate? And has anyone else noticed this? My understanding of "differential isoform expression" or "alternative isoform" usage is based on the idea that there should be potential alternative isoforms for any given gene, otherwise there is no "alternative" and it simply becomes differential gene expression. I could not find anything related to this phenomenon in the cuffdiff documentation (http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/index.html#differential-expression-tests).
My inclination is to exclude all isoforms originating from genes with only one annotated isoform that could possibly be tested, then generate new corrected p-values (i.e. q-values) based on the exclusion of these genes. I would appreciate any input on this. Thanks!
Thank you for your reply. In this scenario I am interested in identifying isoform switching as a drug response, which means having greater than 1 known isoforms. I did in fact use python to categorize genes by number of annotated transcripts and am continuing this analysis knowing that this is a feature of cuffdiff. Thanks, again.