I was previously using KEGG-MAPLE to compute module completion ratios for metagenome assembled genomes. However, this tool was web only (very frustrating) and also it has been discontinued.
Now I'm trying to do something similar but I would like to automate the task.
I want to do the following:
(1) Get KEGG orthologs [KO] for each gene;
(2) Get a table showing that has all the KEGG-modules a KO is associated with
(3) Calculate module completion ratio
for groups of genes in my query.
I found a way to identify KOs using kofamscan
What I''m stuck on is how to do (2) from above. This should allow me to calculate (3).**
Is calculating module completion ratio
as simple as the following (or is it more complicated):
p = KOs in KEGG module set of interest
q = KOs in query set
r = p ∩ q # This is (p intersect q)
MCR = len(r)/len(p)
I came across your post on bioconductor which adresses the problem in more depth. Did you come up with a solution in the meantime? I was unable to find anything so far.
KEGG mapper reconstruct is able to identify the complete modules but i am looking for a tool which i can run from the command line and which produces a file which i can easily process. There is also no way to find out information about incomplete modules using reconstruct.
Yea I ended up using microbeannotator but I think I ended up doing it in Python.
This thread has all my hacks: https://github.com/cruizperez/MicrobeAnnotator/issues/9
Thank you! I will look into that. I tried building a solution myself but parsing the module strings is giving me a headache. Either i can't fully grasp the underlying logic or there are some inconsistencies in the use of parenthesis.