Hi there!
I'm sorry if this question has been already addressed somewhere else but I couldn't find anything yet. My problem arises when im trying to create categories out of GO ontology for my list of genes obtained from a micro array experiment. To be more precise, I'm really looking only to fit in a reasonable amount of categories my ~500 genes which I already know they all belong more or less to cell cycle pathways, so that statistical enrichments is not what im looking for. What I've done so far was to map my list of genes to every Biological Process GO_ID (the list from ~500 genes increased to ~1700 GOBP_ID without IEA terms) and then for each GOBP_ID map all the GO_ANCESTORS (from ~1700 to ~56000 rows). My strategy was then to find out which of all the GO_ANCESTORS were more represented, as these would ideally make good categories, being aware that too high represented GO_ID would map to too generic GO terms and too low represented GO_ID would be associated to too few genes. After a bit of tweaking on filtering out too high and low GO_ID frequencies what i get is around ~80 categories for ~250 genes which is not really great. At this point I'm wondering if GO ontology is any good in doing a job like this or there are out there other ontologies less redundant that would do for me. I know about GO slim but I'm not getting how I could generate my own GO slim and if that would work or not in my case. Sorry to be so long and confusing. Thanks for you time! I hope someone can help me!
Cheers
Bruno
Which background are you using for the enrichment test? I am not really a GO expert, but you should probably use the list of all the genes detectable by your chip as background, and then calculate the enrichment of those who give you signal. That should make the results easier to interpret.