tl;dr How can I get summary information about alternative splicing from a Cufflinks analysis?
A typical RNA-seq workflow using the Tuxedo suite involves mapping reads with Tophat (sample by sample), assembling the aligned reads with Cufflinks (sample by sample), and merging sample-specific assemblies into a single consensus assembly with Cuffmerge. If the samples are from 2 or more conditions, you can then use Cuffdiff to detect genes that are differentially expressed or differentially spliced between the different conditions.
Regarding splicing: some genes will have alternative splicing that is not related to the contrast you are analyzing. For example, we might be comparing lung tissue to brain tissue, and both of gene A's isoforms (A1 and A2) are expressed in each tissue. If the levels of A1 are higher in lung and the levels of A2 are higher in brain, this is designated differential splicing and reported in the splicing.diff file produced by Cuffdiff. However, even if the levels of the two isoforms are the same across the two tissues, this is still alternative splicing since the gene is expressed in multiple isoforms.
So my question is this: how can I summarize alternative splicing from a Cufflinks analysis? For example, I want to report that X genes are alternatively spliced, there are Y alternative isoforms, there are Z cases of exon skipping, W cases of retained introns, etc. I'm not interested in differentially splicing, I'm interested in all alternative splicing whether there is differential usage across my contrast of interest or not.
PS The reason I've gone through so much trouble describing this is because it seems the terms alternative splicing and differential splicing have become conflated, both in the literature and online. I've been searching for quite a while for this information, but it seems whenever I search for info on alternative splicing I invariably find tools that want to report alternative splicing across a contrast--that is, differential splicing. That's fine, but not what I'm looking for here.
I too am seeking some tools to assess differences in isoform prevalence/ratios between two different groups (e.g. samples from affected and unaffected human subjects). We've taken a slightly different pipeline, but it might provide you with some ideas, and identify some common goals.
Read Mapping in Tophat2 => BAM files imported into Seqmonk => Quantitate Raw Reads => Normalization (e.g. Limma VOOM) for linear analyses.
Seqmonk will estimate read counts for the individual mRNA isoforms in the supplied annotation file, and I'm fairly comfortable with this. What I'd like to do now is test the following hypothesis:
For a given gene's k isoforms, do we see an effect of [diagnostic status] on the relative abundances of the respective isoforms? Furthermore, I'd like to do this while controlling for one or more co-variate effects that might be confounded with [diagnostic status], such as continuously-measured [age] or categorical [sex or ethnicity]. To me, this sounds a bit like running a MAN(C)OVA per gene/isoform set. Wondering if anyone has come across / developed tools for accomplishing this sort of task.
I also want to find a tool that can define alternative splicing from an RNA-seq output such as BAM file or transcript gtf file. There is also a tool can define alternative splicing such as AS_profile but least documentation on it, so it quite hard to use it.