Hi All:
I have a question about the bioconductor goseq package for GO enrichment analysis. Those top-ranked categories are obtained based on the ranking of "overrepresentedpvalues" from the goseq object. The goseq also includes "underrepresentedpvalues" from the same output. Can I know how the over/under-representations are determined?
My question can be probably generalized in this way: can I say if there are more DE genes for a particular category, then this category is "enriched" and the associated p-value is called "over-represented", while if there are fewer DE genes for a particular category, then this category is called "depleted" and "under-represented"? Can this be reflected in the sign (+/-) of certain statistics?
I am new to this area, so thank you very much for your help! The vignette of the goseq package can be found here.
Couldn't edit my own (old) post. Wanted to add that a GO-Elite paper has now been published. It is at: http://dx.doi.org/10.1093/bioinformatics/bts366