Hi,
To give some context, I've been trying to compare global vs absolute normalization techniques with my dataset. Thus, for absolute scaling, I've been trying to normalize to the ERCC spike-in within the sample using a DESeq2 method and an edgeR limma+voom method.
The DESeq2 method consists of using RUVSeq to first remove unwanted technical variation (without doing betweenLaneNormalization) and then DESeq is used with the controlGene parameter specified as the ERCC spike-in. Hopefully, this is what I should be doing. The result returned is what I expected.
However, the edgeR limma+voom padj values seems strange. My normalization technique followed the steps below:
> N <- colSums(genes_expr)
> nf <- calcNormFactors(ercc_expr, lib.size=N)
> voom.norm <- voom(genes_expr, design, lib.size = N * nf, plot=T)
The lowest padj values look as such:
where the lowest padj are all identical and "nearly" significant. The pvalue histogram is as such:
I've never seen pvalues act this way so I'm not sure that it's normal. Is there a reason to explain why these values are all somewhat close to 0, as well as they are nearly all the same value. The most significant transcripts from edgeR limma+voom do correspond with those that came up using DESeq2, so I believe that it might just be due to a difference in techniques.
I think that is commonly observed in calculating p-adj (BH) values. For more details, visit this link: https://stats.stackexchange.com/questions/476658/r-why-are-my-fgsea-adjusted-p-values-all-the-same
*Please do not post images of the data