Now I'm doing enrichment using clusterprofiler and WebGestalR. In clusterprofiler:
x <- unique(unlist(as.list(org.Bt.egGO2ALLEGS)))
length(x)#5586
there are 5586 genes with GO annotation.
In WebGestalR:
enrichD_BP <- loadGeneSet(organism = "btaurus",enrichDatabase = "geneontology_Biological_Process_noRedundant")
geneSet_BP <- enrichD_BP$geneSet length(unique(geneSet_BP$gene))
#9011
enrichD_CC <- loadGeneSet(organism = "btaurus",enrichDatabase = "geneontology_Cellular_Component_noRedundant")
geneSet_CC <- enrichD_CC$geneSet length(unique(geneSet_CC$gene))
#6224
enrichD_MF <- loadGeneSet(organism = "btaurus",enrichDatabase = "geneontology_Molecular_Function_noRedundant")
geneSet_MF <- enrichD_MF$geneSet length(unique(geneSet_MF$gene))
#7960
geneSet <- unique(c(unique(geneSet_BP$gene),unique(geneSet_CC$gene),unique(geneSet_MF$gene)))
length(geneSet)
#10085
There are at least 10085 genes with GO annotation.
WebGestalR has more gene set than clusterprofiler. So which one is the most up to date and same with online GO database? But how to get the gene set from online GO database. This question has puzzled me these days. Becasue I think they two are powerful and should have the same results
Thanks, I will try to creat my own OrgDb package.
do you know how to see the data source for the org.Bt.eg.db package and the data source for WebGestaltR package?
If you download the tar.gz source from the org.Bt.eg.db package in Bioconductor, you'll find an SQLite file with the data (this file should also have been installed somewhere when you installed the package). There is no much information on where the original data was downloaded from, nor when. The BioC page shows a citation from 2019, but the tar.gz package shows files modified in April 2022. It may be worth firing up an email to the maintainer of that package, in the worst case you could update it and contribute to BioC. For WebGestalt, maybe you can ask in their mailing list.
Thanks. I download annotation from here http://current.geneontology.org/products/pages/downloads.html I find more than 15000 genes with GO annotation (goa_cow.gaf (gzip)). I think I will use my own TERM2GENE. The same thing about kegg analysis, I find the background gene set number of kegg in clusterprofiler is 9040. The background gene set number of kegg in webgestaltR is 8311. Do you know where to download gene set from https://www.genome.jp/kegg/ Thanks