I am using GOseq on "mm9" as the genome and "geneSymbol" as the ID in my DE.genes and All.genes list. The input files are loaded fine, the named vector is fine and so is the calculated pwf and the nullp plot. The wallenius approximation also fetches the over-represented GO categories amongst the DE genes:
head(GO.wall)
category over_represented_pvalue under_represented_pvalue
1878 GO:0016491 2.278057e-09 1.0000000
220 GO:0003674 2.524681e-07 1.0000000
264 GO:0003824 3.039433e-07 1.0000000
159 GO:0001671 1.629699e-04 0.9999960
1911 GO:0016616 2.039002e-04 0.9999346
1909 GO:0016614 2.229153e-04 0.9999252
However, when I find the report GO categoris over enriched using the .05 FDR cutoff using the enriched.GO command, only the top over-represented GO ID (GO: 0016491) is reported repeatedly in the output:
head(enriched.GO)
[1] "GO:0016491" "GO:0016491" "GO:0016491" "GO:0016491" "GO:0016491" "GO:0016491"
I cant seem to debug what I am doing here here as I have followed the instructions from the vignette.
Here is the complete code for reference:
library(goseq)
assayed.genes<-scan("Allgenes.txt", what=character())
de.genes<-scan("DE.txt", what=character())
gene.vector=as.integer(assayed.genes %in% de.genes)
names(gene.vector)=assayed.genes
print(table(gene.vector))
pwf=nullp(gene.vector,"mm9", "geneSymbol")
print(head(pwf))
GO.MF=goseq(pwf, "mm9","geneSymbol", test.cats=c("GO:MF"))
print(head(GO.MF))
enriched.GO=GO.MF$category[p.adjust(GO.MF$over_represented_pvalue,method="BH")]
print(head(enriched.GO))
I will really appreciate any tips for those who have had more experience with GOseq.
Thanks!
I have the same problem using this function in multiple datasets. Have you find which was the error?
Thanks!