DESeq2: Find differentially expressed genes specific to one condition
3
5
Entering edit mode
9.4 years ago
Christian ★ 3.1k

To all DESeq2 experts or biostatisticians out there:

Say I have an RNA-seq experiment with three conditions: A, B, and C. I want to know which genes are differentially expressed specifically in A; that is, different between A vs. B, A vs. C, but not B vs. C.

Solution 1) The naive approach of course is to perform pairwise comparisons and pull out genes specific to A via set overlaps of the three resulting gene lists. What I don't like about this is that I have to impose two arbitrary cut-offs: one to establish differential expression (say FDR < 0.01, |log2FoldChange| >= 1) and one to establish no differential expression (say FDR > 0.5, |log2FoldChange| < 1). Another disadvantage is that I am stuck with three different FDR and log2FoldChanges that I am not sure how to combine into a final gene ranking, for example for pre-ranked GSEA.

Solution 2) An alternative is to compare A with all non-A, but this would not guarantee me non-differential expression between B and C. Furthermore, I guess I sacrifice power by withholding the information that 'non-A' is actually composed of two conditions (B+C).

After implementing both solutions they still feel like a hack. Can this problem be solved in one nice integrated statistical framework that also allows me to nicely rank my resulting DEGs by statistical significance?

I read about contrasts and likelihood ratio tests in the DESeq2 vignette, but could not come up with an answer so far.

deseq2 RNA-Seq • 7.2k views
ADD COMMENT
0
Entering edit mode

I suspect that solution 1 is the only viable one. Things like a single fold-change aren't defined in the comparison you're trying to make.

ADD REPLY
2
Entering edit mode
7.5 years ago
TriS ★ 4.7k

I had a similar situation, I used limma, not DESeq2 but the concept is the same. the first thing I did was to use your approach #1, which worked fine. then I started working on the contrast matrix.

let's say your data look like

> data
                  A          A           B           B          C           C
gene_1   0.73119617  1.0334765 -0.98585078 -1.70966079 -1.2937223 -0.03300298
gene_2  -0.49758823  1.2640492  1.07019633 -0.52513986  0.7001930 -2.06107495
gene_3   0.71012607  1.1087370  0.14327705 -1.34193321  2.8695163  0.33020928
gene_4   0.87754370  0.2782034  1.06616370  0.98361920 -0.4598298 -0.37822077
gene_5   0.09154438 -0.9534070  0.52547509  3.08489922  0.9944367  2.11002666
gene_6   0.38530556  1.3154407 -0.54681818 -1.38463685 -0.3651117  0.33112452
gene_7   1.21923553 -0.6029883  0.57609204 -2.43241917 -0.4122289 -0.34803657
gene_8   0.57848398  0.5944403  0.49081718  0.35163585  1.1472741 -1.36472408
gene_9   0.06271539  0.5102718  0.06256104 -1.81262365 -0.1244413  0.01500785
gene_10  1.52295243  0.7743652 -0.09160387 -0.08225646 -0.2866435  1.49100980

now build your design + contrast matrix

d <- model.matrix(~0+colnames(data))
colnames(d) <- c("A","B","C")
contr.matrix <- makeContrasts(
  AvsB = B-A, 
  AvsC = C-A, 
  BvsC = C-B,
  AvsAll = A-(B+C)/2,
  levels = colnames(d)
)

then fit the model

fit <- lmFit(data,d)
fit <- contrasts.fit(fit,contr.matrix)
fit <- eBayes(fit)
summary(decideTests(fit))

and this should be pretty close to what you look for

ADD COMMENT
1
Entering edit mode
7.5 years ago
Martombo ★ 3.1k

This seems to be quite a common problem. The method I chose to use is to compare AvsB and AvsC p-values through a p-value combination method, since I was not interested in specifying that the results would not be significant in BvsC. Otherwise you can try to use the following contrast: A - (B+C)/2. This again only compares A to the average of B and C, but at least the other DEA steps (like dispersion estimation) will treat B and C separately.

The only other thing I can think of is to compare B and C for no differential expression and to combine these p-values with the one you obtain from A vs avg(B,C).

ADD COMMENT
0
Entering edit mode

How about an (A-(B+C)/2) - (B-C) model?

ADD REPLY
0
Entering edit mode

I think that would return a significant result if A=20, B=30, C=10? which is not what you want

ADD REPLY
0
Entering edit mode

(A - (B+C)/2) - (B-C) = (20 - (30+10)/2) - (30-10) = (20-20) -20 = -20

So yes, it would be significant, BUT what that would be telling you is that the gene is significantly less up in condition than other conditions, where as a positive value would be telling up it was more up.

ADD REPLY
0
Entering edit mode
7.5 years ago

Hi Christian,

I have the same issue. Did you resolve it?

Thanks,

Christian

ADD COMMENT
0
Entering edit mode

you can try this:

#dd is your DESeq matrix
 vsd <- varianceStabilizingTransformation(dd, blind=TRUE)
vst <- assay(vsd)
colnames(vst) <- colnames(count.table)
vst <- vst - min(vst)
dds <- DESeq(dd)

res.AvsB <- results(dds, contrast=c("treatment","A", "B"))
res.AvsC <- results(dds, contrast=c("treatment","A", "C"))

#select  expressed genes 

mat.A.B <- assay(vsd)[head(order(res.AvsB$padj)), ]
pheatmap(mat.A.B) 

mat.A.C <- assay(vsd)[head(order(res.AvsCt$padj)), ]
pheatmap(mat.A.C)
ADD REPLY
0
Entering edit mode

Unfortunately not. I went ahead with the set-overlap solution, although I am still not fully satisfied with it from a statistical point of view.

EDIT: should have been a comment to the previous answer, not a reply to this comment.

ADD REPLY
0
Entering edit mode

How about trying to building your linear model to do a difference of differences test? You want to ask if the difference between the A/notA samples is bigger than the B-C difference. I suspect Gordon Smyth is the person to ask hope to do this, although you might need to be happy using limma voom/edgeR rather than DESeq2.

ADD REPLY

Login before adding your answer.

Traffic: 1830 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6