Hi,
I'm working on RNA-seq analysis of a non-model plant and I need to generate a KO list to use iPath2.
I used "Kegg Mapper" to obtain a KO annotation, but I got 20% from Human Disease - even it having been selected just Solanaceae. How can I deal with these?
If anyone can help me, I'll really appreciate that.
I've actually run into the same thing today when I was wondering why my clover had Diabetes - the Kegg Pathways are split into six level A groups from ko00001.keg, which in turn are split into several subgroups:
Metabolism
Genetic Information Processing
Environmental Information Processing
Cellular Processes
Organismal Systems
Human Diseases
K-numbers can be in several of those at once, for example, my clover has K01580, which is part of "Butanoate metabolism PATH:ko00650]" which is part of "Carbohydrate metabolism", which sounds alright.
However, K01580 is also part of "4940 Type I diabetes mellitus [PATH:ko04940]" in the "Endocrine and metabolic diseases" under "Human Diseases", since the same pathways are employed in diseases in humans.
Yes, it is possible that many genes are associated with human diseases. There is a bias in these annotation databases, since most research is done with human diseases.
How to deal with it? Use your common sense I would say. A computer does not understand the difference between human or plant genes, but you as a researcher do!
Thank you so much! I appreciate your attention!