Hello, I have a question regarding GO enrichment analysis: I`ve identified DEG by Deseq2 and would like to perform a GO enrichment analysis with topgo or similar for this set of data. I have an annotated transcriptome from Trinotate, but it contains the go term information for several transcripts from one gene. According to the vignette of Topgo I have to prepare a list o the gene id and respective go terms. My question are now:
a) Do I map now all the go terms of all transcripts from one gene together?
b) Do I have to de-duplicate, if always the same go terms occur between the transcripts within one gene?
gene / transcripts / GO
1 / 1a / GO:123, GO:456, GO:987
1 / 1b / GO:123, GO: 000
1 / 1c / GO:123, GO: 456
--> prepare list (in case of b) Is this right?
gene / GO
1 / GO:123, GO:456, GO: 987, GO:000
Thank you for any comment or link to clarify this!
Thank you for your answer! I agree with you that it might not be a good idea to have duplicated genes as input for topgo. I was wondering, if I have to keep duplicated go terms for one gene : gene / GO 1 / GO:123, GO:456, GO:987, GO:123, GO: 000, GO:123, GO: 456
or I have to de duplicate the terms as above in b)
To clarify, my answer to b was that I don't know if duplicated GO terms are an issue but that removing duplicates doesn't cost you anything. I added the warning about duplicated genes because not only is it not a good idea, topGO would throw an error.
Thank you for your help!