DEXSeq Expression and Usage Metrics
2
3
Entering edit mode
9.8 years ago

Hi,

I've been running DEXSeq and I'm having trouble understanding the results that I'm getting. I've used the suggested model ~ sample + exon + condition:exon, which should give me differential exon usage between my two conditions.

When plotting the exon usage of my two conditions and the exon expression of my two conditions, I've generally found that they're the inverse of each other. For example, if I have an exon in a gene that looks to be differentially expressed between the two conditions, it doesn't look to have differential exon usage. Vice versa, if an exon in a gene appears to have no differential expression between two conditions, it's shown to have differential usage.

I'm struggling to understand the difference between these two metrics, does anyone have a good explanation?

Secondly, If I wanted to test for differential exon expression, what would I have to change my model to, to achieve this?

Thanks

EDIT

Sample Table

      condition    libType                                        countName
F1_S1         A paired-end raw_counts//sorted_Flowcell_A_1.bam.dexseq.count
F1_S2         B paired-end raw_counts//sorted_Flowcell_A_2.bam.dexseq.count
F1_S3         A paired-end raw_counts//sorted_Flowcell_A_3.bam.dexseq.count
F1_S4         A paired-end raw_counts//sorted_Flowcell_A_4.bam.dexseq.count
F1_S5         B paired-end raw_counts//sorted_Flowcell_A_5.bam.dexseq.count
F1_S6         A paired-end raw_counts//sorted_Flowcell_A_6.bam.dexseq.count
F1_S7         B paired-end raw_counts//sorted_Flowcell_A_7.bam.dexseq.count
F1_S8         A paired-end raw_counts//sorted_Flowcell_A_8.bam.dexseq.count
F2_S1         A paired-end raw_counts//sorted_Flowcell_B_1.bam.dexseq.count
F2_S2         B paired-end raw_counts//sorted_Flowcell_B_2.bam.dexseq.count
F2_S3         A paired-end raw_counts//sorted_Flowcell_B_3.bam.dexseq.count
F2_S4         A paired-end raw_counts//sorted_Flowcell_B_4.bam.dexseq.count
F2_S5         B paired-end raw_counts//sorted_Flowcell_B_5.bam.dexseq.count
F2_S6         B paired-end raw_counts//sorted_Flowcell_B_6.bam.dexseq.count
F2_S7         A paired-end raw_counts//sorted_Flowcell_B_7.bam.dexseq.count
F2_S8         A paired-end raw_counts//sorted_Flowcell_B_8.bam.dexseq.count​

Code Used

dxd <- DEXSeqDataSetFromHTSeq(list.files("raw_counts/", full.names=T),
                              sampleData    = sampleTable,
                              design        = ~ sample + exon + condition:exon,
                              flattenedfile = "../scrips/genome.gff")

dxd  <- estimateSizeFactors(dxd)
dxd  <- estimateDispersions(dxd)
dxd  <- testForDEU(dxd)
dxd  <- estimateExonFoldChanges(dxd, fitExpToVar="condition")

dxr1 <- DEXSeqResults(dxd)

Example

< image not found >

DEXSeq exon-usage expression • 7.5k views
ADD COMMENT
1
Entering edit mode

N.B., I asked Alejandro to have a look at this question (finally found him on twitter thanks to Mike Love).

ADD REPLY
0
Entering edit mode

could you put your code and your sampleTable please

ADD REPLY
0
Entering edit mode

Edited the post

ADD REPLY
0
Entering edit mode

Can you give an example of exons that seem to be differentially used but not differentially expressed? The point of differential use is to control for gene-level expression differences between samples that aren't then relevant to isoform usage/splicing.

ADD REPLY
0
Entering edit mode

I've added an example image of what I mean, if that helps!

ADD REPLY
5
Entering edit mode
9.8 years ago

Thanks for the updates. I have a feeling that the best answer you'll get is from Alejandro Reyes by posting to the bioconductor support site. One thing I should note is that I expect you can get weird results like this for genes that are also differentially expressed, as is the case in your example. The ~sample part of the model essentially corrects for group mean differences, so you then will get some weird results and fold-change directions if a gene is differentially expressed (e.g., that's probably why the "exon usage" plot looks like a shifted version of the normalized counts/expression...since the sample means are then more similar).

Now having said that, looking into the code for plotDEXSeq() is telling. The main difference is what happens if we specify splicing=T versus expression=T. If you specify expression=T, you end up invoking:

coeff <- as.matrix( t(getEffectsForPlotting(es, averageOutExpression=FALSE, groupingVar=fitExpToVar) )[featuresInGene,] )

whereas with splicing=T, you get:

coeff <- as.matrix( t( getEffectsForPlotting(es, averageOutExpression=TRUE, groupingVar=fitExpToVar) )[featuresInGene,] )

You can see that the only difference is the averageOutExpression parameter, which multiplies the fitted values by their means (well, it adds the mean, but since this is on the log2 scale this ends up being a scaling operation). Now if you're using normalized values to begin with, it would seem to me that doing that is just going to produce funky results.

Edit: I should add that this really emphasizes the importance of producing and looking at plots like this. Clearly the statistical results in this case are non-sense due to the differential expression of that gene...even though there's also legitimate differential exon usage going on. A better model might be to try to account for the sample nesting with groups, but that's a whole other can of worms and I suspect has a lot of issues of its own (as is often the case with mixed-effects models).

Edit2: Actually, the more I think about it, the "better model" I mentioned above wouldn't be better. Ignore that.

Edit3: To more directly answer your question: differential expression shows the fit normalized counts, whereas differential usage shows that after accounting for differences due to each fit coefficient (specifically, those interacting with exon). The former is closer to the raw data but the latter will better match the p-values and fold-changes. I would argue that it's best to compare the statistical results to the expression values (or just the normalized counts), for the reasons demonstrated in this example.

ADD COMMENT
0
Entering edit mode

Thanks for the response, Devon. I'll repost this to the bioconductor forum and see if I get any response from Alejandro Reyes.

So I guess that begs the question, is my model correct for what I'm trying to see? I thought what I was looking with using DEXSeq was essentially differential expression of exons, but that appears not to be the case really. I'm still not clear what the difference between expression and usage is precisely. Other than the fact that that usage tries to take into account the number of isoforms of the gene.

Would using the model ~ sample + exon + condition yield me the results of differential exon expression?

hmm, the plotDEXSeq() function stuff is interesting. I'll play around with that.

ADD REPLY
1
Entering edit mode

Your original model was correct, it just fails on genes like the one shown. You're interested in finding significant condition:exon coefficients, since that's then telling you when there's differential usage. The problem arises with how the sample and condition-specific effects are controlled for. Using ~sample is generally a good idea, since it absorbs differences due to all treatment conditions. However, when there's a condition effect in one isoform and the base expression in the other is extremely low, this seems to be causing wrong results that might not occur if one instead used a ~condition*exon model. However, I've not thought about this enough to actually recommend that.

ADD REPLY
2
Entering edit mode
9.8 years ago
reyes1089 ▴ 20

This post has been addressed by Simon in the link below!

https://support.bioconductor.org/p/64007/

ADD COMMENT

Login before adding your answer.

Traffic: 1177 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6