Hi, I'm running the fisher.test function in R.
My code is:
dfx <- input_fishers
res1<- NULL
for (i in 1:nrow(dfx)){
table1 <- matrix(c(dfx[i,1], dfx[i,2], dfx[i,3], dfx[i,4]), ncol = 2, byrow = TRUE)
p1<- fisher.test(table1, alternative = "greater")$p.value
res1<- c(res1,p1)
}
dfx$fishers <- res1
x1 <- p.adjust(dfx$fishers, method = "BH", n = length(dfx$fishers))
dfx$p.adj <- x1
y1<- dfx[dfx$p.adj<0.05,]
My confusion mainly stems from the actual input.
My input matrix is set out as follows:
GO.ID Test.Set Test.Pop Ref.Set Ref.Pop
1 GO:0000003 1 274 16 19634
2 GO:0000041 1 274 44 19634
3 GO:0000122 3 274 265 19634
4 GO:0000139 16 274 474 19634
5 GO:0000165 1 274 109 19634
6 GO:0000166 13 274 2654 19634
First column is number of differential genes that have the SPECIFIC GO term (row name: GO.ID) Second column is the total number of differentially expressed genes with ANY go term Third column is the number of genes which have the SPECIFIC GO term in the entire transcriptome (this includes DE genes; row name: GO.ID) Fourth column is total number of genes in transcriptome with ANY GO term
However, I'm having doubts, should the matrix be:
GO.ID DE.GO DE.NOTGO Exp.transcriptome.GO Exp.transcriptome.NOTGO
1 GO:0000003 1 273 16 19618
2 GO:0000041 1 273 44 19590
3 GO:0000122 3 271 265 19369
4 GO:0000139 16 258 474 19160
5 GO:0000165 1 273 109 19525
6 GO:0000166 13 261 2654 16980
Can someone also clarify if I indeed count the DE genes in the expressed transcriptome reference set?
Many thanks!