Question

cluster Yeast ORFs based on GO biological process term

0

Entering edit mode

8.6 years ago

Zara ▴ 20

Hi, I have a gene expression file for Yeast. I want to cluster Yeast genes based on GO biological process. So I need to generate a file containing genes and their relative GO terms (this is not an enrichment analysis, I have just a search over all genes). I was thinking I may download the complete GO file (something like that) and then apply clustering (For example I can find all GO process unique terms, and then create N clusters relative to each GO unique Biological process term). I don't know how can I find the GO file. Is there any dataset in which I submit my gene list and get the GO terms? One issue here is that I have 6100 ORFs (not gene id) and I need to search based on that.

sample ORF: YAL001C YAL002W YAL003W YAL004W YAL005C YAL007C YAL008W

Any help is appreciated :)

Yeast GO biological process • 3.3k views

ADD COMMENT • link updated 8.6 years ago by Andrzej Zielezinski 11k • written 8.6 years ago by Zara ▴ 20

0

Entering edit mode

You can download all the GO annotations for yeast from the Gene Ontology website. But what is it actually you are trying to accomplish through this analysis?

ADD REPLY • link 8.6 years ago by Lars Juhl Jensen 11k

0

Entering edit mode

Thank you for your reply. I want to run gene regulatory inference algorithms on yeast gene expression data. In some papers the authors claim that before inference they do clustering. I want to do the same. Clustering genes based on biological process, then run network inference on each cluster.

ADD REPLY • link 8.6 years ago by Zara ▴ 20

0

Entering edit mode

Since inference of gene regulatory networks is based on the gene expression data, would it not make more sense to first identify the subset of genes that show clear regulation under the conditions studied? You cannot infer regulatory interactions for non-regulated genes anyway, so I would suggest reducing your analysis to the relevant subset of genes first.

ADD REPLY • link 8.6 years ago by Lars Juhl Jensen 11k

0

Entering edit mode

You mean I find the differentially expressed genes under each condition and then run the inference only on that subset?

ADD REPLY • link 8.6 years ago by Zara ▴ 20

score 3 · Accepted Answer · 2017-01-02

Use yeastgenome.org.

In order to get GO terms for your list of ORFs, you can use one of the two options :

From the menu: Function > Gene Ontology, choose GO Term finder. Upload a file with your ORF ids or paste it in the textarea.
Parse the file containing mapping of gene products to GO-Slim terms http://downloads.yeastgenome.org/curation/literature/go_slim_mapping.tab

YAL001C TFC3    S000000001  C   chromosome  GO:0005694  ORF|Verified
YAL001C TFC3    S000000001  C   cytoplasm   GO:0005737  ORF|Verified
YAL001C TFC3    S000000001  C   mitochondrion   GO:0005739  ORF|Verified
YAL001C TFC3    S000000001  C   nucleus GO:0005634  ORF|Verified
YAL001C TFC3    S000000001  F   DNA binding GO:0003677  ORF|Verified
YAL001C TFC3    S000000001  F   nucleic acid binding transcription factor activity  GO:0001071  ORF|Verified
YAL001C TFC3    S000000001  F   transcription factor activity, protein binding  GO:0000988  ORF|Verified
YAL001C TFC3    S000000001  P   transcription from RNA polymerase III promoter  GO:0006383  ORF|Verified

YAL002W VPS8    S000000002  C   cytoplasm   GO:0005737  ORF|Verified
YAL002W VPS8    S000000002  C   cytoplasmic vesicle GO:0031410  ORF|Verified
YAL002W VPS8    S000000002  C   endomembrane system GO:0012505  ORF|Verified
YAL002W VPS8    S000000002  C   membrane    GO:0016020  ORF|Verified
YAL002W VPS8    S000000002  F   enzyme binding  GO:0019899  ORF|Verified
YAL002W VPS8    S000000002  P   endosomal transport GO:0016197  ORF|Verified
YAL002W VPS8    S000000002  P   protein targeting   GO:0006605  ORF|Verified

[..]