How to analize multiple letters codes in COG IDs?
1
0
Entering edit mode
21 months ago
Constanza • 0

Hi Biostars,

I'm trying to analize undescribed genes by their functional categories (eggnog) and it would be very helpful to receive suggestions regarding how to deal with these "chimeric" codes such as 'IQ', 'ET', and 'EGIPQ'. I've read that researchers often keep just one ID per gene, but I aim to keep as much info as possible.

example of chimeric annots

Thanks in advance.

COG • 817 views
ADD COMMENT
1
Entering edit mode
21 months ago
Mensur Dlakic ★ 28k

The meaning of individual letters is described in a well-known COG functional code:

https://ecoliwiki.org/colipedia/index.php/Clusters_of_Orthologous_Groups_(COGs)

For the rest I can only guess, but it shouldn't be difficult to find that information on EggNog website. I think letters separated by commas mean individual domains with certain functions. For example, in case of aapm that means two domains of which one is involved in Amino Acid metabolism and transport, and the other in Inorganic ion transport and metabolism. When the letters are not separated by a comma, it could mean that two or more functions reside in the same domain.

ADD COMMENT
0
Entering edit mode

Thanks for answering so fast, it was very helpful. I shouldn't pay much attention to if COGs are separated by a comma or not then. As long as they are assigned to a gene, for basic enrichment analyses it's irrelevant if they were detected in the same or different domains. Have a great day!

ADD REPLY

Login before adding your answer.

Traffic: 1449 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6