Hi all, I am doing Differential Gene Expression Analysis using DESeq2. I have 8 samples in total (4 treated and 4 untreated) with 3 replicates of each. I am using the the code given below:
library(DESeq2)
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~genotype*treatment)
For extracting the results I tried 2 codes: Method A:
GO1 <- results(dds, name=c("genotype_B_vs_Col.0"), alpha=0.05, lfcThreshold=2)
GO1 = subset(GO1, padj<0.05)
summary(GO1)
out of 3 with nonzero total read count
adjusted p-value < 0.05
LFC > 2.00 (up) : 3, 100%
LFC < -2.00 (down) : 0, 0%
Method B:
GO <- results(dds, name=c("genotype_B_vs_Col.0"), alpha=0.05)
GO <- subset(GO, log2FoldChange >1 | log2FoldChange <1)
GO = subset(GO, padj<0.05)
summary(GO)
out of 2287 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 1156, 51%
LFC < 0 (down) : 1131, 49%
I am sorry this kind of question has been explained here many times but I am still confused. Question1: Which method is correct using lfcThreshold filtering (A) or only alpha value(B) and if its A what should be the lfcThreshold value to be used? Question2: Why there is difference in these 2 results? (log2FC 1 = FC 2 as I understand)
Could anyone help me in this please. Thank you
Thank you very much Carlo. This is really very helpful.
This also cleared up my confusion. But, at the same time, led me to another one: which method is more useful? I am under the impression that we the "right" way of doing it is to to test significance of expression vs 0, hence I would be inclined to use method B. I am not sure when to use the method A.
Good question ! According to DESeq2 original paper, it makes more sense to use method A:
However, method B is more used in practice and it also makes sense. All in all, I don't think that there is a strong consensus on that yet.