I have a large RNA-Seq dataset with reads of about 200bp long. I've already used other tools to annotate these reads with gi numbers, but now want to associate them with KEGG/COG IDs for pathway analysis. Can someone please help me understand the difference (and different uses) for KEGG and COG IDs and what is the best way for large batch (on the scale of millions) annotation?
Thanks!
Great, very helpful post!
@ Neilfws, I see this post is 2-year-old. I was wondering could you update with new perspectives. I was going through KEGG Vs COG, I am finding this publication PMID-25428365. So now, is it better to use KEGG or COG? Thanks.
Just to add an update for anyone who happens to find this topic, there was a COG update (2021):
Galperin, Michael Y., Yuri I. Wolf, Kira S. Makarova, Roberto Vera Alvarez, David Landsman, and Eugene V. Koonin. "COG database update: focus on microbial diversity, model organisms, and widespread pathogens." Nucleic Acids Research 49, no. D1 (2021): D274-D281.