I've been performing GO enrichment using limma::goana on RNAseq data analyzed with DESeq2. The results is a list of enriched GOIDs. Additionally, I wanted to get all the ENTREZIDs for each differentially enriched GO term found in the results. I went about doing this by trying to get all GOIDs and all ENTREZIDs that correspond to them and left_join these to the results.
This is the code that I used to obtain all of the GO IDs and all the ENTREZIDs for human (in this case). Note that I use org.Hs.eg.db:
go_list <- mapIds(org.Hs.eg.db, keys(org.Hs.eg.db, "GO"), "ENTREZID", "GO", multiVals = "list") ;
go_vector <- lapply(go_list, as.vector);
ezs <- sapply(go_vector, paste0, collapse = ";") ;
go_df <- data.frame(GOID = names(go_vector), ENTREZID = ezs) ;
My problem is that (for example) GOIDs GO:0032501 and GO:0048731 don't appear in this list of all GOIDs, but these GOIDs appear in my GO enrichment results using limma::goana. This is the code that I use for the results:
go_a <- goana(resdata1_entrez$ENTREZID[(resdata1_entrez$padj < thresh_padj & abs(resdata1_entrez$log2FoldChange) > l2FC)], resdata1_entrez$ENTREZID, organism) %>% as_tibble()
go_res <- go_a %>% mutate(GOID = mapIds(GO.db, .$Term, "GOID", "TERM") %>%
unname()) %>% dplyr::select(GOID, everything()) %>%
arrangeP.DE) %>%
mutateADJ.P.DE = p.adjustP.DE, method = "BH"))
Basically, I want to know why GOIDs are able to be mapped to ENTREZIDs of the GO enrichment results using the mapIDs() function, but when I try to get all GOIDs (again using mapIDs function), not all of the GOIDs appear. Presumably, it's because I'm using mapIDs() querying GO.db to get the GOIDs based on the ENTREZIDs of the GO enrichment results, and using org.Hs.eg.db to get all GOIDS and corresponding ENTREZIDs. Does anyone know which database I should use?
I don't think I totally get the issue, but to address your question at the end of the post, you can pull up the help of
limma::alias2Symbol()
, which specifies the DB packages that limma uses.