Question

how to get the gene set from online GO database

1

Entering edit mode

2.4 years ago

pengmin.wang.1 ▴ 30

Now I'm doing enrichment using clusterprofiler and WebGestalR. In clusterprofiler:

x <- unique(unlist(as.list(org.Bt.egGO2ALLEGS)))
length(x)#5586

there are 5586 genes with GO annotation.

In WebGestalR:

enrichD_BP <- loadGeneSet(organism = "btaurus",enrichDatabase = "geneontology_Biological_Process_noRedundant") 
geneSet_BP <- enrichD_BP$geneSet length(unique(geneSet_BP$gene))
#9011

enrichD_CC <- loadGeneSet(organism = "btaurus",enrichDatabase = "geneontology_Cellular_Component_noRedundant") 
geneSet_CC <- enrichD_CC$geneSet length(unique(geneSet_CC$gene))
#6224

enrichD_MF <- loadGeneSet(organism = "btaurus",enrichDatabase = "geneontology_Molecular_Function_noRedundant") 
geneSet_MF <- enrichD_MF$geneSet length(unique(geneSet_MF$gene))
#7960

geneSet <- unique(c(unique(geneSet_BP$gene),unique(geneSet_CC$gene),unique(geneSet_MF$gene)))
length(geneSet)
#10085

There are at least 10085 genes with GO annotation.

WebGestalR has more gene set than clusterprofiler. So which one is the most up to date and same with online GO database? But how to get the gene set from online GO database. This question has puzzled me these days. Becasue I think they two are powerful and should have the same results

gene WebGestalR set clusterprofiler background • 1.6k views

ADD COMMENT • link 2.4 years ago by pengmin.wang.1 ▴ 30

score 1 · Answer 1 · 2022-07-01

1

Entering edit mode

2.4 years ago

Giovanni M Dall'Olio 28k

It is likely a difference in the version for the annotation for this organism, although we would need to look into the details of these two packages to be sure.

You could reach out to the maintainer of the org.Bt.eg.db package and ask them to verify if there is a new version of the data upstream, or you could have a go at creating an OrgDb package yourself.

The Bos Taurus annotation likely comes from the GeneOntology Download site, I think? http://current.geneontology.org/products/pages/downloads.html

Note that enricher and other functions from clusterProfiler have an option to provide a custom dataframe of Gene 2 Term annotations, which you could use for your analysis.

ADD COMMENT • link 2.4 years ago by Giovanni M Dall'Olio 28k

1

Entering edit mode

Thanks, I will try to creat my own OrgDb package.

do you know how to see the data source for the org.Bt.eg.db package and the data source for WebGestaltR package?

ADD REPLY • link 2.4 years ago by pengmin.wang.1 ▴ 30

0

Entering edit mode

If you download the tar.gz source from the org.Bt.eg.db package in Bioconductor, you'll find an SQLite file with the data (this file should also have been installed somewhere when you installed the package). There is no much information on where the original data was downloaded from, nor when. The BioC page shows a citation from 2019, but the tar.gz package shows files modified in April 2022. It may be worth firing up an email to the maintainer of that package, in the worst case you could update it and contribute to BioC. For WebGestalt, maybe you can ask in their mailing list.

ADD REPLY • link 2.4 years ago by Giovanni M Dall'Olio 28k

1

Entering edit mode

Thanks. I download annotation from here http://current.geneontology.org/products/pages/downloads.html I find more than 15000 genes with GO annotation (goa_cow.gaf (gzip)). I think I will use my own TERM2GENE. The same thing about kegg analysis, I find the background gene set number of kegg in clusterprofiler is 9040. The background gene set number of kegg in webgestaltR is 8311. Do you know where to download gene set from https://www.genome.jp/kegg/ Thanks

ADD REPLY • link 2.4 years ago by pengmin.wang.1 ▴ 30