Are p-values obtained from two separate enrichment analysis on the same population of genes comparable?
For example, let's say I have two differentially expressed gene lists from the same population of genes. ListA is enriched for cell cycle with a p-value of 0.01, listB is enriched for cell cycle with a p-value of 0.001.
Would it be correct to say cell cycle is more significantly enriched in listB than listA? Are the p-values comparable?
I would apply some sort of multi-test correction before comparing multiple lists (especially if they are more than a handfull). The usual caveats for gene enrichment apply: bonferroni too conservative, most FDRs probably also: check David EASE score for an alternative (not really multi-test correction).
Check maybe Huang, DW; Sherman, BT; Lempicki, RA (2009). Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37(1):1-13
Thanks for the useful reference.
I agree with you about too conservative multi-test corrections, and even if It does not seem to be the case with Dk lists, I suggest data splitting techniques and a critical approach (e.g. 'Improving Validation Practices in “Omics” Research' http://www.sciencemag.org/content/334/6060/1230.full)
One important thing to keep in mind is that the p-value is not a quality measure. It is simply a measure of likelihood of observing the measure by accident considering a certain data. Therefore the underlying data' properties (in your case the number of GO terms that could be used factor in here as well) are the ones that determine the p-value and it is not a characteristic of the final observation.
IMO the purpose of the p-value is to accept the selection or reject it. In general I don't think it should be used to rank anything (though in reality just about everyone does it all the time). We (me included) tend to rank by p-value when we run out of options.
I would try to find a different measure/attribute to rank my genes and avoid comparing the p-values.
Thanks. This is exactly my problem. I was trying to see whether I can say one is "more significant" than another. I think I'll take a different approach now and try to add another dimension to my data by looking at fold change.
I think the answer is yes. That is, as long as these two lists represent an analysis of the same experiment. When you say "the same population of genes" this seems to imply a single data set (from a single experiment) representing some "universe" of genes - e.g. all the mouse genes represented on an array, some of which can be classified as cell cycle genes. Given that a p-value represents a fractional area under a curve, since listB takes up a tenth of the area of listA, I would call this more significant - even though the curves (or the analysis process that generated them, which you haven't explicitly stated) may be different shapes.
I agree with Istvan. You can say that one p-value is more significant than the other, but you CANNOT say that they are significantly different. That requires a different test on the hypothesis that the fold-change for the two genes is different.
I would say It is correct if you generate listA and listB following indipendent hypotheses and use the same statistics to evaluarte enrichment