Question

DEXSeq Expression and Usage Metrics

3

Entering edit mode

9.8 years ago

andrew.j.skelton73 6.6k

Hi,

I've been running DEXSeq and I'm having trouble understanding the results that I'm getting. I've used the suggested model ~ sample + exon + condition:exon, which should give me differential exon usage between my two conditions.

When plotting the exon usage of my two conditions and the exon expression of my two conditions, I've generally found that they're the inverse of each other. For example, if I have an exon in a gene that looks to be differentially expressed between the two conditions, it doesn't look to have differential exon usage. Vice versa, if an exon in a gene appears to have no differential expression between two conditions, it's shown to have differential usage.

I'm struggling to understand the difference between these two metrics, does anyone have a good explanation?

Secondly, If I wanted to test for differential exon expression, what would I have to change my model to, to achieve this?

Thanks

EDIT

Sample Table

      condition    libType                                        countName
F1_S1         A paired-end raw_counts//sorted_Flowcell_A_1.bam.dexseq.count
F1_S2         B paired-end raw_counts//sorted_Flowcell_A_2.bam.dexseq.count
F1_S3         A paired-end raw_counts//sorted_Flowcell_A_3.bam.dexseq.count
F1_S4         A paired-end raw_counts//sorted_Flowcell_A_4.bam.dexseq.count
F1_S5         B paired-end raw_counts//sorted_Flowcell_A_5.bam.dexseq.count
F1_S6         A paired-end raw_counts//sorted_Flowcell_A_6.bam.dexseq.count
F1_S7         B paired-end raw_counts//sorted_Flowcell_A_7.bam.dexseq.count
F1_S8         A paired-end raw_counts//sorted_Flowcell_A_8.bam.dexseq.count
F2_S1         A paired-end raw_counts//sorted_Flowcell_B_1.bam.dexseq.count
F2_S2         B paired-end raw_counts//sorted_Flowcell_B_2.bam.dexseq.count
F2_S3         A paired-end raw_counts//sorted_Flowcell_B_3.bam.dexseq.count
F2_S4         A paired-end raw_counts//sorted_Flowcell_B_4.bam.dexseq.count
F2_S5         B paired-end raw_counts//sorted_Flowcell_B_5.bam.dexseq.count
F2_S6         B paired-end raw_counts//sorted_Flowcell_B_6.bam.dexseq.count
F2_S7         A paired-end raw_counts//sorted_Flowcell_B_7.bam.dexseq.count
F2_S8         A paired-end raw_counts//sorted_Flowcell_B_8.bam.dexseq.count

Code Used

dxd <- DEXSeqDataSetFromHTSeq(list.files("raw_counts/", full.names=T),
                              sampleData    = sampleTable,
                              design        = ~ sample + exon + condition:exon,
                              flattenedfile = "../scrips/genome.gff")

dxd  <- estimateSizeFactors(dxd)
dxd  <- estimateDispersions(dxd)
dxd  <- testForDEU(dxd)
dxd  <- estimateExonFoldChanges(dxd, fitExpToVar="condition")

dxr1 <- DEXSeqResults(dxd)

Example

< image not found >

DEXSeq exon-usage expression • 7.5k views

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by andrew.j.skelton73 6.6k

1

Entering edit mode

N.B., I asked Alejandro to have a look at this question (finally found him on twitter thanks to Mike Love).

ADD REPLY • link 9.8 years ago by Devon Ryan 104k

0

Entering edit mode

could you put your code and your sampleTable please

ADD REPLY • link 9.8 years ago by Nicolas Rosewick 11k

0

Entering edit mode

Edited the post

ADD REPLY • link 9.8 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

Can you give an example of exons that seem to be differentially used but not differentially expressed? The point of differential use is to control for gene-level expression differences between samples that aren't then relevant to isoform usage/splicing.

ADD REPLY • link 9.8 years ago by Devon Ryan 104k

0

Entering edit mode

I've added an example image of what I mean, if that helps!

ADD REPLY • link 9.8 years ago by andrew.j.skelton73 6.6k

2

Entering edit mode

9.8 years ago

reyes1089 ▴ 20

This post has been addressed by Simon in the link below!

https://support.bioconductor.org/p/64007/

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by reyes1089 ▴ 20

Ram · Accepted Answer · 2015-01-09

Thanks for the updates. I have a feeling that the best answer you'll get is from Alejandro Reyes by posting to the bioconductor support site. One thing I should note is that I expect you can get weird results like this for genes that are also differentially expressed, as is the case in your example. The ~sample part of the model essentially corrects for group mean differences, so you then will get some weird results and fold-change directions if a gene is differentially expressed (e.g., that's probably why the "exon usage" plot looks like a shifted version of the normalized counts/expression...since the sample means are then more similar).

Now having said that, looking into the code for plotDEXSeq() is telling. The main difference is what happens if we specify splicing=T versus expression=T. If you specify expression=T, you end up invoking:

coeff <- as.matrix( t(getEffectsForPlotting(es, averageOutExpression=FALSE, groupingVar=fitExpToVar) )[featuresInGene,] )

whereas with splicing=T, you get:

coeff <- as.matrix( t( getEffectsForPlotting(es, averageOutExpression=TRUE, groupingVar=fitExpToVar) )[featuresInGene,] )

You can see that the only difference is the averageOutExpression parameter, which multiplies the fitted values by their means (well, it adds the mean, but since this is on the log2 scale this ends up being a scaling operation). Now if you're using normalized values to begin with, it would seem to me that doing that is just going to produce funky results.

Edit: I should add that this really emphasizes the importance of producing and looking at plots like this. Clearly the statistical results in this case are non-sense due to the differential expression of that gene...even though there's also legitimate differential exon usage going on. A better model might be to try to account for the sample nesting with groups, but that's a whole other can of worms and I suspect has a lot of issues of its own (as is often the case with mixed-effects models).

Edit2: Actually, the more I think about it, the "better model" I mentioned above wouldn't be better. Ignore that.

Edit3: To more directly answer your question: differential expression shows the fit normalized counts, whereas differential usage shows that after accounting for differences due to each fit coefficient (specifically, those interacting with exon). The former is closer to the raw data but the latter will better match the p-values and fold-changes. I would argue that it's best to compare the statistical results to the expression values (or just the normalized counts), for the reasons demonstrated in this example.