(which means it beats DAVID imho, whose web API seems a little shaky)
I was wondering if anyone knew of an R (or Python) based wrapper for the API?
If not, does anyone have any experience with GeneCoDis, and would there be any interest in an R/Python wrapper for GeneCoDis were I to go ahead and make one?
As a side note, as this API appears to be SOAP based, does anyone know any other services I can use, programatically, to do (at least) GO enrichment? I thought this would be an easy job, but it's proving a pain.
@Khader - the problem with most of the answers there is that they require you to use web UIs. There's only so many times I can click on a drop-down saying "mus musculus" before I need to write a wrapper.
GeneCODIS looks interesting, especially the options beyond GO. Just tried a list of genes, but surprised to note that it gives no enriched pathway results. But I do have 7 enriched KEGG pathways in my list. They are providing open port with WSDL, you can write a wrapper script using R / Python. They provide example using perl module SOAP::Lite, I think the Python equivalent will be SOAP.py (you may also check this discussion on StackOverflow about Python based SOAP modules) and R equivalent will RSOAP / RWebServices
It is not clear to me why you need to use a SOAP based framework for enrichment calculation. You can perform GO enrichment using BioConductor packages. For example GOStats / topGO.
To be honest I'm having trouble with GOstats in R - for some reason I can't quite figure it out. It seems to be requiring annotation files, which I guess is for a background comparison? My lists of genes are going to be from multiple arrays, and so I'm not sure how to proceed. I had forgotten topGO, though. Thanks!
The impression I'm getting about SOAP so far is that it's a pain to use. I'm definitely not desperate to write a wrapper, but I thought that in the end, having one wrapper to do all my GO and KEGG enrichment would be worth the investment.
@Khader - sorry! I mean, I've used RankProd to assemble a list of genes from experiments using different types of arrays. Rather than compare their expression directly, I've looked at their position across a set of ranked lists.
I guess this also begs the question of whether I should rely on online services (I suppose DAVID etc provide a standard background list for each organism) or should I have a copy of the GO, KEGG pathways etc on my machine and do all my analysis offline. I'm tempted to try to rely on services as much as possible, as I'm guessing they're better at curating databases than I am.
If you are looking at an alternative programmatic access for GO enrichment which can deal with flexible background, I will recommend GO::TermFinder from Gavin Sherlock's group.
Not sure if there is any equivalent in Python for GO-TermFinder. I am sure there could be a R/BioC package that allow for flexible background definitions. You may modify / post another question about that or you could also check at BioC mailing list.
As a side note, as this API appears to be SOAP based, does anyone know any other services I can use, programatically, to do (at least) GO enrichment? I thought this would be an easy job, but it's proving a pain.
You may also check the related questions about GO enrichment at BioStar : Tools To Find Gene Ontology Term Enrichment
@Khader - the problem with most of the answers there is that they require you to use web UIs. There's only so many times I can click on a drop-down saying "mus musculus" before I need to write a wrapper.