I am working on a project using Bioconductor that requires that I lookup which GO ontology a given GO term belongs to (i.e. either Molecular Function, Biological Process, or Cellular Component). I need to do this tens of thousands of times, in the inner loop of a larger program. My current solution is to use the GO.db Bioconductor package to create three predicates like this one:
library(GO.db)
isMF <- function(term){
!is.null(GOMFPARENTS[[term]])
}
Unsurprisingly, however, this is prohibitively slow when invoked tens of thousands of times. Is there a Bioconductor package out there somewhere that would give me a faster way to look up this data, or will I need to implement a faster data structure for this purpose myself? I'm just learning R, so I'd like to just use an existing function, if possible.
If you're just learning, you might want to explore a bit more. This is a perfect place to use hash tables.