Question

Cufflinks / Cuffdiff Output - How Are Tests Different?

19

Entering edit mode

13.1 years ago

Stephen 2.8k

I've got two different cellular fractions and I'm looking for genes that are alternatively spliced, alternatively polyadenylated, differentially expressed, etc. I'm running cufflinks/cuffdiff in galaxy and I'm trying to grok what the different tests are doing.

Cuffdiff outputs 11 files (four FPKM tracking files, 7 files of results). Omitting the four FPKM tracking files, here are the 7 results files with a snippet from the the cuffdiff documentation:

Differential expression testing for transcripts: FPKM of one group vs FPKM of the other.
Differential expression testing for genes: This sums the FPKM for transcripts sharing the same gene_id.
Differential expression testing for coding sequence (CDS): This sums the FPKM of transcripts sharing a common p_id, which is the id of the coding sequence that this transcript contains.
Differential expression testing for primary transcripts: This sums FPKM of transcripts sharing a common tss_id (transcription start site).
Differential splicing tests: For each primary transcript, this tests the amount of overloading detected among isoforms, i.e. how much differential splicing exists between isoforms processed from a single primary transcript.
Differential coding output: For each gene, this tests the amount of overloading detected among its coding sequences, i.e. how much differential CDS output exists between samples.
Differential promoter use: For each gene, the amount of overloading detected among its primary transcripts, i.e. how much differential promoter use exists between samples.

My questions are:

How are tests for differential splicing (#5) different from tests for differential coding output (#6)?
How are the tests for differential gene expression summing over gene ids (#2) different that tests for gene expression summing over CDS ids (#3)?
Tests #5-7 above are testing something fundamentally different than the tests for differential gene expression (tests #1-4). I'd like a good explanation of how these groups of tests differ. E.g. how does #3 (differential expression over CDS) differ from #6 (differential coding output).

Thanks very much in advance.

cufflinks cuffdiff galaxy gene • 17k views

ADD COMMENT • link 12.7 years ago by Stephen 2.8k

score 7 · Answer 1 · 2011-12-20

To answer a part of my own question, I drew out a schematic of what tests 1-4 are doing. Each is grouping transcripts at a different level.

Doesn't group any - each is a separate transcript and tested independently.
All are grouped at the gene level.
Transcripts B and C are grouped because they share a common protein coding sequence.
Transcripts A and C are grouped because they share a common primary transcript.

Image: http://i43.tinypic.com/35am6j7.jpg

alt text

score 5 · Answer 2 · 2011-12-19

Hello, I think I got most of this figured out:

How are tests for differential splicing (#5) different from tests for differential coding output (#6)

differential splicing is at the primary transcript level, so you will look at each group of transcripts that share the same TSS (more correct definition: that have the same pre mRNA processing transcript, so you are clustering different splicing isoforms), and test if the mix of splicing isoforms is different. The statistical test is based on the Jensen-Shannon divergence, which is a test on the distribution difference, so it will be sensitive if in one sample there is one (or more) splicing isoform is more representative of that primary transcript output than in the other sample; however, the test is not sensitive to difference in primary transcript total volume (you will have to use differential expression tests for that).
different CDS output looks at the different coding sequences you produce after splicing, i.e. the different combinations of exons you can produce; it's a proxy for protein output, but of course it does not take into account anything post-mRNA processing. The test is at the gene level, not at the primary transcript level, so it will also factor in alternative TSS usage and alternative promoter usage; also, if you have differential splicing for one primary transcript, but that primary transcript does not have the lion share's of the gene's transcription output, it will scarcely affect the CDS output difference. However, if you have transcripts that do not differ by their exon sequence but differ by UTRs, this difference will not be factored in (as there is no difference in coding sequence). The statistical test is again based on the Jensen-Shannon divergence, so it won't be sensitive to difference in total gene transcription (you will have to use differential expression tests for that).

I think this also sheds light on the other questions.

In summary: differential CDS and splicing output tests look at difference in distribution over different possible isoforms (of spliced transcripts or coding sequences), whereas differential expression tests look at difference in total level.

score 0 · Answer 3 · 2011-11-02

0

Entering edit mode

13.1 years ago

Flashton • 0

Hi Stephen,

I'm afraid I can't help you with your question (other than to suggest there might be two streams of analysis, one for ORFs and another for CDSs).

However, I was hoping you can shed some light on why you used Cuffdiff for your analysis rather than DESeq, EdgeR or BaySeq. I'm about to embark on an RNA-seq analysis project and any input you might have on the relative merits of these programs would be greatly appreciated.

Many thanks,

Phil

ADD COMMENT • link 13.1 years ago by Flashton • 0

0

Entering edit mode

Cuffdiff was just the first thing I tried - I was helping someone with an analysis where all the data was already in Galaxy, and cuffdiff was easy to run. I'm looking at DESeq now as integrated into the ExpressionPlot suite expressionplot.com), which has some nice functionality

ADD REPLY • link 13.1 years ago by Stephen 2.8k

score 0 · Answer 4 · 2011-12-05

0

Entering edit mode

13.0 years ago

Josh • 0

I can't help with your analysis but I have been using Expressionplot on our local server for several months and really like it. Just for what it's worth.

ADD COMMENT • link 13.0 years ago by Josh • 0