Dear Everyone,
I am using differential expressed genes from a RNA dataset test performed on EdgeR. From my "gene universe" (=circa 800 genes) I picked up the most significant up-regulated one (circa 300 for the 1% threshold) that I consider as "geneList" object for the enrichment analysis with topGO. For a vast majority of the "gene universe" correspond from 1 to 8 GO terms.
Now that the topGOdata class is set, I am struggling with the choice of the statistics family to use either KS or Fisher): I have gone through the topGo manual for it so many times that I know the related paragraphs almost by heart... And visited all potential blogs related to it.
If I resumes, as I understand, I should use my test among following (from p.5 of the Manual): - Fisher test: for count data - KS test: for modified data (e.g. p-values)
So I conclude that with my dataset, it seemed logical to use the ks test. But in all literature with the same kind of dataset, people publish results of the fisher test.
The statistical test to use with topGO is still very cryptic to me. I am probably not aware enough of how the different tests are running and producing the outputs.
Could anyone give me some advise on how to choose the stats with topGO and/or to set up the data (geneList) in accordance to it ?
Thank you very much for any comment or link to clarify this !
p.s. more technically and detailed: my "geneList" data is the logFC- pruned FDR values output from DE analysis with EdgeR. From this I partition the down- form the up-regulated most significant (=FDR) genes. And I want to perform the GO enrichment to each partitioned subsets.
You use Fisher's exact test on a contingency table i.e. a table of counts which is normally what you get when doing a GO terms analysis. The Kolmogorov-Smirnov test compares two distributions. What makes you think that your data is unsuitable for an enrichment test using Fisher's exact test ?
Most of the tools do Fisher's exact (hypergeometric) test to assign GO term.