Question

Differential gene expression analysis by DESeq2

0

Entering edit mode

5.5 years ago

umeshtanwar2 ▴ 30

Hi all, I am doing Differential Gene Expression Analysis using DESeq2. I have 8 samples in total (4 treated and 4 untreated) with 3 replicates of each. I am using the the code given below:

library(DESeq2)
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~genotype*treatment)

For extracting the results I tried 2 codes: Method A:

GO1 <- results(dds, name=c("genotype_B_vs_Col.0"), alpha=0.05, lfcThreshold=2)
GO1 = subset(GO1, padj<0.05)
summary(GO1)
out of 3 with nonzero total read count
adjusted p-value < 0.05
LFC > 2.00 (up)    : 3, 100%
LFC < -2.00 (down) : 0, 0%

Method B:

GO <- results(dds, name=c("genotype_B_vs_Col.0"), alpha=0.05)
GO <- subset(GO, log2FoldChange >1 | log2FoldChange <1)
GO = subset(GO, padj<0.05)
summary(GO)
out of 2287 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 1156, 51%
LFC < 0 (down)     : 1131, 49%

I am sorry this kind of question has been explained here many times but I am still confused. Question1: Which method is correct using lfcThreshold filtering (A) or only alpha value(B) and if its A what should be the lfcThreshold value to be used? Question2: Why there is difference in these 2 results? (log2FC 1 = FC 2 as I understand)

Could anyone help me in this please. Thank you

RNA-Seq DESeq2 lfcThreshold • 2.4k views

ADD COMMENT • link updated 5.5 years ago by Carlo Yague 8.9k • written 5.5 years ago by umeshtanwar2 ▴ 30

score 6 · Answer 1 · 2019-06-04

6

Entering edit mode

5.5 years ago

Carlo Yague 8.9k

There are two confusions here. First, log2FC 1 = FC 2 is true. However, lfcThresholdin the first instance is also in log (thats what the first letter "l" stand for) ! So if you want to compare your two methods, you should use the same value for the log2 threshold (1 for instance).

Now that this is out of the way, the main issue here is that you are not testing the same thing with both method, so the pvalues are different. In method A, you are testing for differences of expression significantly bigger than the lfcThreshold. In B, you are testing for differences of expression significantly bigger than 0, and subsequently filter for those with |log2FoldChange| > 1.

To illustrate the distinction, if a gene is overexpressed a bit more than two-folds (say, FC=2.1), it might be that this overexpression is not significantly bigger than 2 (so it doesn't pass the test in method A) while it is very significantly different than 0 (it passes the test in method B), depending on the number of reads supporting the overexpression. Overall, method A is more stringent than method B.

Hope this helps,

Carlo.

ADD COMMENT • link 5.5 years ago by Carlo Yague 8.9k

0

Entering edit mode

Thank you very much Carlo. This is really very helpful.

ADD REPLY • link 5.5 years ago by umeshtanwar2 ▴ 30

0

Entering edit mode

This also cleared up my confusion. But, at the same time, led me to another one: which method is more useful? I am under the impression that we the "right" way of doing it is to to test significance of expression vs 0, hence I would be inclined to use method B. I am not sure when to use the method A.

ADD REPLY • link 4.6 years ago by jfo ▴ 50

1

Entering edit mode

Good question ! According to DESeq2 original paper, it makes more sense to use method A:

"A common procedure is to disregard genes whose estimated LFC β ir is below some threshold, |β ir |≤θ [this is method B here]. However, this approach loses the benefit of an easily interpretable FDR, as the reported P value and adjusted P value still correspond to the test of zero LFC. It is therefore desirable to include the threshold in the statistical testing procedure directly, i.e., not to filter post hoc on a reported fold-change estimate, but rather to evaluate statistically directly whether there is sufficient evidence that the LFC is above the chosen threshold [this is method A]. "

However, method B is more used in practice and it also makes sense. All in all, I don't think that there is a strong consensus on that yet.

ADD REPLY • link 4.6 years ago by Carlo Yague 8.9k