I've got two different cellular fractions and I'm looking for genes that are alternatively spliced, alternatively polyadenylated, differentially expressed, etc. I'm running cufflinks/cuffdiff in galaxy and I'm trying to grok what the different tests are doing.
Cuffdiff outputs 11 files (four FPKM tracking files, 7 files of results). Omitting the four FPKM tracking files, here are the 7 results files with a snippet from the the cuffdiff documentation:
- Differential expression testing for transcripts: FPKM of one group vs FPKM of the other.
- Differential expression testing for genes: This sums the FPKM for transcripts sharing the same gene_id.
- Differential expression testing for coding sequence (CDS): This sums the FPKM of transcripts sharing a common p_id, which is the id of the coding sequence that this transcript contains.
- Differential expression testing for primary transcripts: This sums FPKM of transcripts sharing a common tss_id (transcription start site).
- Differential splicing tests: For each primary transcript, this tests the amount of overloading detected among isoforms, i.e. how much differential splicing exists between isoforms processed from a single primary transcript.
- Differential coding output: For each gene, this tests the amount of overloading detected among its coding sequences, i.e. how much differential CDS output exists between samples.
- Differential promoter use: For each gene, the amount of overloading detected among its primary transcripts, i.e. how much differential promoter use exists between samples.
My questions are:
- How are tests for differential splicing (#5) different from tests for differential coding output (#6)?
- How are the tests for differential gene expression summing over gene ids (#2) different that tests for gene expression summing over CDS ids (#3)?
- Tests #5-7 above are testing something fundamentally different than the tests for differential gene expression (tests #1-4). I'd like a good explanation of how these groups of tests differ. E.g. how does #3 (differential expression over CDS) differ from #6 (differential coding output).
Thanks very much in advance.
Thanks Daniele. Great answer.
I'm wondering if cufflinks supplies the percent representation of transcripts or CDS's within a gene (equivalent to the field PSI from MISO, or the IsoPct field from RSEM)? I understand that the spilicing.diff and cds.diff supply a differential test based on the differences in relative abundance within the gene, but what about the actual values?
In the same vein, what does the √JS(x,y) actually mean in terms of the change in that transcripts' role in the mix? The manual gives 0.22115 as an example of a significant value for the test stat - but doesn't explain if in this case the tested transcript has increased or decreased its portion out of the total gene expression and by how much.