Entering edit mode
5.2 years ago
lihe.liu
▴
30
Hi community,
I wonder what is a good way/package to get all the GO terms for a certain species in R.
I work with Bos taurus, and I tried Ensembl and org.Bt.eg.db database, however, they give me quite a different number of GOs.
Seems that Ensembl has way more GOs than org.Bt.eg.db.
org.Bt.eg.db has it's unique ones though.
library(biomaRt)
library(org.Bt.eg.db)
database = useMart("ensembl")
genome = useDataset("btaurus_gene_ensembl", mart = database)
gene = getBM(attributes = c("ensembl_gene_id","go_id","name_1006"),mart = genome)
# all the go from biomart
all_go1 = unique(na.omit(gene$go_id))[-1]
length(all_go1) # total 15118
# all the go from org.Bt.eg.db
all_go2 = AnnotationDbi::keys(org.Bt.eg.db,keytype = c("GO"))
length(all_go2) # total 9032
# intersect
table(all_go2 %in% all_go1)
table(all_go1 %in% all_go2)
Thank you so much!
Best.
The difference between Ensembl and org.Bt.eg.db has already been explained here. Because of these differences, it is usually not recommended to mix references. Just pick one or the other for your project and stick to it. If you start mixing references, you'll get into various kinds of troubles down the line.