Hi,
I have around 100 different gene sets (as tab-delimited files, no ranking) and would like to see whether my experimental gene list is enriched for any of those gene sets, by hypergeometric testing in R. (This is a custom genome (Toxoplasma gondii), I can't use e.g. PANTHER, DAVID, ReactomePA as far as I can tell). I'm an R novice and am lost as to how to do this test for multiple gene sets. To test one gene set, I've been using (a bit long-winded):
dhyperRandom <- function(myGeneList, myGeneSet, genome){
myRandomGS <- sample( genome,size=length(myGeneSet) )
myX <- length(which(myGeneList %in% myRandomGS))
myM <- length(myRandomGS)
myN <- length(genome) - length(myM)
myK <- length(myGeneList)
return(dhyper(x=myX, m=myM, n=myN, k=myK))
}
for(i in 1:1000){
pvalue[i] <- dhyperRandom( myGeneList, myGeneSet, genome )
}
mean(pvalue)
I could go through each gene set individually...but there must be a way of automating this process and reporting the data in a single table. I'd be very grateful for any suggestions!
Natalie
As you did loop for permutations, write similar loop for gene lists. I would try something like this:
You should have paths to your datasets in allGeneSets. This is not tested, suggestions to optimize are welcome :-)
Thanks a lot for your suggestion Pgibas! I pasted the random gene set iterations but perhaps should start with just a regular hypergeometric test on each gene set. I tried what you suggested but having some problems. This is the code I'm trying:
Output:
When I do
length(which(myGeneList %in% myGeneSet))
separately I get the wrong number... This is what the files look like that I'm reading in:Sorry if these are stupid questions! Like I said, R newbie :)