Hi, I have a gene expression file for Yeast. I want to cluster Yeast genes based on GO biological process. So I need to generate a file containing genes and their relative GO terms (this is not an enrichment analysis, I have just a search over all genes). I was thinking I may download the complete GO file (something like that) and then apply clustering (For example I can find all GO process unique terms, and then create N clusters relative to each GO unique Biological process term). I don't know how can I find the GO file. Is there any dataset in which I submit my gene list and get the GO terms? One issue here is that I have 6100 ORFs (not gene id) and I need to search based on that.
sample ORF: YAL001C YAL002W YAL003W YAL004W YAL005C YAL007C YAL008W
Any help is appreciated :)
You can download all the GO annotations for yeast from the Gene Ontology website. But what is it actually you are trying to accomplish through this analysis?
Thank you for your reply. I want to run gene regulatory inference algorithms on yeast gene expression data. In some papers the authors claim that before inference they do clustering. I want to do the same. Clustering genes based on biological process, then run network inference on each cluster.
Since inference of gene regulatory networks is based on the gene expression data, would it not make more sense to first identify the subset of genes that show clear regulation under the conditions studied? You cannot infer regulatory interactions for non-regulated genes anyway, so I would suggest reducing your analysis to the relevant subset of genes first.
You mean I find the differentially expressed genes under each condition and then run the inference only on that subset?