Do you know any resource or software (Bioconductor, Biopython, etc) that can be employed for summarising KAAS output?
While conducting a mock research following the Trinity-Transdecoder-KAAS pipeline based on the following paper. I have submitted the .pep file generated by Transdecoder to the Kegg automatic annotation service (KAAS) and this has generated a list of KO (Kegg Orthology) numbers. Whereas I find it useful, I still don't see the point of such raw data. I would like to have it further summarised, for example, it would be great if I could get the number of pathways involved in each category (20 metabolism pathways, 30 genetics pathways and so on) or something like that. Really, what I would like is to perform a gene set analysis with this list of KO numbers and expect to see categorization. Correctly organized, this data could be plotted (pie plots for instance) in order to gain an insight into the kind of pathways and biological entities present in the sample.
I have made some research on biostars, and similar questions like this, this, this or this have been asked, but they either fail to answer to this question, or the proposed answer does not work anymore. I have also searched the Kegg website and I haven't found anything better than the KEGG Reconstruct Pathway, which renders a count of pathways within each subcategory that nevertheless has to be painstakingly parsed from html.
To sum up, I want to go from a huge KO numbers list to general statistics and counts. How can I get done? Thanks in advance.
It's not KO, it's K0 K1 etc.