I am doing a GO analysis for my gene sets and plan to implement the Benjamini-Hochberg method to adjust the resulted pValues for multiple testing correction. Since the BH method depends on the total number of testing or pValues calculated, I wonder if it is ok or not to remove all GO terms with only 1 gene hits (or those with 1 or 2 gene hits) before calculating the pValues? In that way, the total number of pValues will be reduces, which may produce more significant adjust pValues. The logic is that the GO terms with just 1 or 2 genes hits are more likely not to be significant.
So my plan is like this:
- Find out how many GO terms are included in my gene sets
- Remove those GO terms with just 1 or 2 gene hits
- Calculate enrichment pValues for the rest GO terms, and the total number of testing will be equal to the GO terms with >2 gene hits
- Use BH method to adjust pValues
Is this procedure ok or not? Are there any published papers with similar procedures? Any comments or references will be appreciated. Thank you!
It looks like you used a threshold of at least 5 genes of 5% of a pathway. How did you decide on those thresholds? Do you have a reference for that minimum size?
Thanks!
Hi amandastahlke,
I have used the p-value cutoff. Just to make it more stringent, I have added one more layer of the cutoff. No rule was applied.