I'm struggling using topGO to do some GO enrichment. Apologies in advance for the trivial nature of this question: I must be misunderstanding something.
I have 200 probesets, and I'd like to see if they're significantly represented in any GO terms. The names of the probesets are in a character vector called interesting_genes
. I have an ExpressionSet object called exprset
which contains all the data from the microarray experiment I'm working on.
I'd like to use the R package topGO
. From one of Khader's answers to a previous post, I know that I should be able to use this package to perform enrichment without having to pass in data of any kind (indeed, most of the web-tools I can use for this only require a list of genes, and to select a background set). It's just I'm having trouble persuading the API to let me do what I want, and I'm having trouble interpreting the errors that result. Here is my code:
genes = factor(as.integer(rownames(exprs(exprset)) %in% interesting_genes))
names(genes) <- rownames(exprs(exprset))
all_genes = factor(rep(0,nrow(exprset))
names(all_genes) <- rownames(exprs(exprset))
levels(all_genes) <- levels(genes)
GOdata <- new("topGOdata", description = "getGO", ontology = "BP",
allGenes = all_genes, geneSel = genes, nodeSize = 10,
GO2gene = list(mogene10sttranscriptclusterGO2PROBE), annot = annFUN.GO2genes
)
Note the awful construction of all_genes
- this must be wrong. But topGO requires a "named object of type numeric of factor", and subsequently demands two factor levels, even though all that really makes sense to pass, in this simple approach to enrichment, is a list of names.
Currently the error I'm getting is
Error in
if is.na(index) || index < 0 || index > length(nd)) stop(paste("selected vertex", :
missing value where TRUE/FALSE needed
but I don't really know how to interpret this. I'm sure it has something to do with the way I'm forming the object. So my question is:
How do I form a topGO object without using any expression data or scoring information? Specifically, what am I misunderstanding here?
just after I asked this question I realised something about how to pass in the annotations. So have edited the question a bit to cure that particular tidbit of naivete. Still, stuck, though.
Can you post few ids from your list ?
@Khader - yep! "10351026" "10463751" "10537347" "10602176" "10426648" "10462752" ...
Despite the two helpful answers below, I still can't get this guy to work. It's starting to feel more like a Bioconductor list question now, though.
Despite the two helpful answers from Brad and lGautier below, I still can't get this guy to work. It's starting to feel more like a Bioconductor list question now, though.
If you can map your probe ids to genes, then you can easily do the analysis with out expression data or scoring information : see this link for a working code - http://bit.ly/bk6Ylp (posted by Chuangye earlier)
@Khader - thanks! That provided the clues I needed. It turns out that I was maybe making life a bit more complicated than necessary. Shall I put what I did in the end as answer, or just delete the question? I guess this turned out to be a bit more of a Bioconductor mailing list question...
Please don't delete the question. As you already posted a final solution, let this remain as a source for future reference on enrichment with out p-values.