I am using the limma package to model gene expression based on a continuous variable while controlling for technical variables. When applying traditional p-value cutoffs (FDR < 0.05), I do not identify any differentially expressed genes (DEGs). However, using a more relaxed FDR threshold of <0.1, I find over 1,700 DEGs.
This prompted me to investigate the p-value distribution, which reveals a substantial deviation from the null hypothesis, indicated by a peak near zero.
Upon further examining the FDR distribution, I observe the following pattern:
FDR < 0.05: 0 DEGs
FDR < 0.06: 196 DEGs
FDR < 0.07: 585 DEGs
FDR < 0.08: 884 DEGs
Has anyone encountered this pronounced increase in the number of DEGs within such a narrow FDR interval? What could be causing this spike?
Here is the structure of the code I am using:
design <- model.matrix(~1 + ContinuousVariable + TechnicalVariables, data = sample.info)
vwts <- voomWithQualityWeights(dge, design = design, plot = FALSE)
vfit2 <- lmFit(vwts)
vfit2 <- eBayes(vfit2)
res.p <- topTable(vfit2, coef = 2, sort.by = "logFC", number = "inf", p.value = 0.1)
Thank you very much!!
Is there a typo in your first paragraph? Your plot doesn't match up with their being 17,000 DEGs.
Yes, thanks. It was supposed to be 1,700 and has been fixed.