I am doing Over-Representation pathway analysis as described here.
In this way, the p-value is calculated for each pathway as follows:
where N is the total gene number on all pathways. K is the gene number on a certain pathway. M is the number of genes of interest. x is the number of genes hit on this pathway.
Assume, we have 56340 genes on all pathways (N=56340), the number of genes of interest is 1 (M=1), and the number of genes in given pathway is 30 (K=30). Moreover, none of genes of interest hit the pathway, so x=0.
If we calculate p-value, we get
p =1- dhyper (0, 30, 56340-30, 1) = 0.0005324814
(Please note it is less than 0.01.)
After calculating the p-values for all of pathways, I obtain many p-values less than 0.01.
Now my question is: how can I choose enriched pathways based on the obtained p-values? As you can see when none of genes of interest hit the given pathway, we obtain p-value less than 0.01, as well. Is it rational if I consider pathways without any common gene with genes of interest as enriched pathway?
Can anyone help me?
Thanks
For enriched pathways, you'd just look at a one-tailed probability...