Bioconductor -- Quickly Look Up Aspect Of Go Term
1
1
Entering edit mode
11.2 years ago
cclark ▴ 10

I am working on a project using Bioconductor that requires that I lookup which GO ontology a given GO term belongs to (i.e. either Molecular Function, Biological Process, or Cellular Component). I need to do this tens of thousands of times, in the inner loop of a larger program. My current solution is to use the GO.db Bioconductor package to create three predicates like this one:

library(GO.db)
isMF <- function(term){
!is.null(GOMFPARENTS[[term]])
}

Unsurprisingly, however, this is prohibitively slow when invoked tens of thousands of times. Is there a Bioconductor package out there somewhere that would give me a faster way to look up this data, or will I need to implement a faster data structure for this purpose myself? I'm just learning R, so I'd like to just use an existing function, if possible.

go bioconductor r • 4.4k views
ADD COMMENT
0
Entering edit mode

If you're just learning, you might want to explore a bit more. This is a perfect place to use hash tables.

ADD REPLY
3
Entering edit mode
11.2 years ago
Martin Morgan ★ 1.6k

Please ask questions about Bioconductor packages on the Bioconductor mailing list (no subscription required). As with most things in R, it's better to use vectorized operations rather than iterating. Also, the interface to GO and other databases has been simplified. You could instead

> vals = select(GO.db, keys(GO.db, "GOID"), c("TERM", "ONTOLOGY"))
> dim(vals)
[1] 37391     3
> head(vals)
        GOID                                                         TERM ONTOLOGY
1 GO:0000001                                    mitochondrion inheritance       BP
2 GO:0000002                             mitochondrial genome maintenance       BP
3 GO:0000003                                                 reproduction       BP
4 GO:0000006 high affinity zinc uptake transmembrane transporter activity       MF
5 GO:0000007     low-affinity zinc ion transmembrane transporter activity       MF
6 GO:0000009                       alpha-1,6-mannosyltransferase activity       MF

and then do standard R operations, e.g., vals[vals$ONTOLOGY == "MF",]. The Annotation work flow provides some additional material.

ADD COMMENT

Login before adding your answer.

Traffic: 2229 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6