I have an answer that subsumes most of what has been written, use the test statistic itself.
When it comes down to it, what are we doing when we filter by p-value, adjusted p-value, logFC, logFC stdErr, etc? We are making a heuristic that is intended to help pull out interesting data - that's all.
The problem is, these have different strengths and weaknesses - let's illustrate.
Problems with p-adj and p-val: For instance, let's say you download 3 studies of the same phenotype, then you find a gene that has a very low p-value in all studies. Great! right? Not necessarily... What if the p-value is low, but the logFC is positive in one, and negative in another?! Even though the p-val is significant, the gene might not mean anything if it is up in one study and down in the next - may be a fluke.
Problems with logFc. If you filter by logFC alone, and you don't include the LogFC StdErr
, you may just be enriching for low-quality, highly variable data.... a logFC of 10 is meaningless if the logFC Std Error is 6. Indeed, in this case, the p-value would not be significant ... the ratio of the mean logFC to the std error is 10/6, so the test statistic is only 1.666, N.S.
For these reasons, I don't use pvalue padj or logFC alone - but I will make compound filters that use them together.
However, there is one metric that addresses all the problems together: the test statistic itself. (could be LRT or wald or score test).
A Wald statistic is calculated by dividing the mean by the std. error (in this case logFC/logFC Std Err), and then it is used to calculate the p-value directly. As such, it is a go-between that relates all of the others.
- it gives direction of effect, because logFC /logFC Std Err can be positive or negative.
- It gives likelihood, because the value of the stat alone is sufficient to calculate the p-value.
- it doesnt give the magnitude of the effect in real terms, but I usually eliminate results with abs(logFC) < 1 to begin with, so that matters less, because I know I am at least looking at things that have a ratio of 2:1...
I feel like "much higher significance" is impossible to prove. You would want to see if the magnitude of foldchange increases/decreases or if a new set of genes are differentially regulated. But comparing pvals doesn't make sense to me.
Thank you, completely agree.
If you make a volcano plot for each of these experiments, you may be able to show them how the pvalues cannot compare, and at the same time maybe identify some interesting, actually relevant comparison.