Hi there!
I have a large list of EC numbers and would like to do a functional analysis of what they are actually doing in the cell.
For that, which would be the best/most direct method to convert each EC to a COG functional category?
I have results for KEGG pathways, but a large part of my ecs (30%) can't be mapped to any pathway, besides I get 85 different categories that I can't group myself into broader ones. Kegg modules does even worse, 83 categories, 66% of ecs not in any module. Uniprot Pathway is also bad (can't annotate 57% of the ecs, although only 48 categories).
The worse from this is that the results of KEGG/Uniprot don't match at all (the categories are very different)
I would like to try COG, as the number of functional categories is smaller, and maybe I could cover more EC numbers.
Is biochemistry this diverse and unknown?? Anyone with more ideas to annotate EC numbers to biological processes/broad functions? (e.g. aminoacid metabolism, cofactor metabolism... etc)
This is a somewhat unusual problem. If you think about it in GO terms, EC numbers are mostly a subset of molecular functions (i.e. the enzymatic subset), whereas KEGG maps are closely related to biological processes. There is a reason why GO has two separate, parallel "hierarchies" for these: you cannot really map between them because it is a many-to-many relationship. This is probably also why you have a hard time trying to map EC numbers to KEGG maps. I think the best question I can ask at this stage is "why"? How have you ended up needing to do this (at best) very difficult mapping? Is there perhaps a way you can avoid it altogether?
Hi. Thank you for you comment. In fact, the only reason why I have a low coverage of ECs in KEGG pathways is just because those ECs are not in any KEGG pathway. They are, within KEGG, unknown in terms of their biological processes. The field for pathway (or module) is empty for that enzyme entry.
I am currently using BRENDA (although their SOAP access is acting terribly.)