As far as I know the sampleGOdata
comes from:
sampleGOdata <- new("topGOdata",
description="Simple session",
ontology="BP",
allGenes=genes.list,
geneSel= topDiffGenes, #Here is a function that selects the gene list above according to a cutoff, in my case I used logFC not p-values
nodeSize=10, #To choose the precision of the GO database
annot= annFUN.gene2GO, #Change in function of the type of the array we got
gene2GO=geneID2GO) #Object with the relation between gene and GO, readMappings(file)
If I don't remember wrong, the scores are keep in the sampleGOdata
and with its scores it calculates the importance of each GOterm where each gene belongs.
Changing the algorithm will change the way of using the scores, but whatever they are or in which order they are they will be used (Here enter the nodeSize in action), to calculate the GOterm importance, so I don't see the meaning of using ascending scores. But maybe I didn't understood well the vignette
But I agree the vignette could be improved, one of this points I don't see clear enough related to the question is if the getSigGroups
function and the runTest
are the same exactly or what are their differences...
so the vignette implies that if you run the fisher test then it just wants a function that tells it whether a gene is in or out (geneSel) - that part is clear. If you run the KS test then it will use the values in allGenes (a named vector). These values should denote whether a gene is more or less differentially expressed, for example. What those values can be (pvals, ranks, score, counts) and which direction they must run is what I am asking.
I thought that it didn't matter the direction, and it automatically read that the lowest value is the best one. But according to your answer, it somehow affects the order to the test, at least for p-values. But then I wonder what happened with my data and the logFC I used...
if you ran the Fisher test it would be ok - since your geneSel function determines what genes are in or out and that is the only criteria. If you ran the KS test, it would either throw an error b/c of negative values or simply consider the most negative genes as most relevant. That is probably not what you intended.