I have the following: a list of UniProt KB Protein identifiers and the associated GO Terms as GO:XXXXXX
I want to check if, given a subset of my UniProt identifiers (and their associated GO-Terms), any function is overrepresented within that subset. If the program only takes identifiers and fetches the GO-terms by itself that is fine with me :)
Most tools i found so far require a gene list, what i do not have at hand, but only protein IDs.
Any suggestion for a tool would be very appreciated. Best if it works in either R or python :)
Thank you! T.
I second DAVID. There are other tests (like those mentioned in the other answer involving Bioconductor that are more statistically rigorous, but for exploratory analysis I like DAVID. It's fairly simple and straightforward to use.
And it's possible to use DAVID's API
http://david.abcc.ncifcrf.gov/content.jsp?file=DAVID_API.html
I long have been a big fan of DAVID -- the interface and the features are great. However, the GO annotations are very old! We discovered this both through our own comparisons and analysis, as well as from discussions within the biocuration community. As fast as new annotations are being added and ontologies are being updated, stale data is a huge issue! For these reasons I've stopped recommending DAVID to my colleagues...
I didn't know they were using an old version of GO. That is good to know. I sort of assumed they were keeping that up to date.