Hello everyone,
Thank you in advance for your feedback and suggestions.
I am currently doing some RNA-Seq analysis using DESeq2 and I have noticed that when I plot a histogram of my raw pvalues, I notice most of the genes are at p value of 1. My histogram looks very similar to scenario D of the following link http://varianceexplained.org/statistics/interpreting-pvalue-histogram/. My first question is why would this be happening? My first thought was the presence of lowly expressed genes, however, after looking over the guide for DESeq2, it mentions that in the analysis lowly expressed genes are filtered out of the analysis. So any feedback on this is appreciated. In addition, I am making several comparisons and only one comparison results in a p value histogram similar to scenario A of the link. My second question is why would some comparisons result in anticonservative values (scenario A) and in the same analysis other comparisons would result in conservative p values (scenario D).
Many thanks!!
Can you confirm you are plotting the raw pValue, and not the adjusted one (Padj)?
Hey Lisa, There can be any number of reasons for that. It would greatly help to go over your experimental design (including sample n per group) and also the code that you've used so far. Note that the hypothesis going into DESeq2's testing is that no genes are statistically significantly differentially expressed. You could also explain how you generated your raw counts, and show the dispersion plot and box-and-whisker plots of the regularised log counts.
I am not sure that the blog sentiment is correct because it implies that we should always feel good when we get many genes that are statistically significantly differentially expressed. On the contrary, I am skeptical when this happens, unless we're talking about the comparison of, say, neurons versus PBMCs, or some knockdown experiment.