In the KEGG metabolic pathway networks, while describing a reaction, not all participating compounds are identified as substrate or product. Some compounds, like H20, ATP, NAD are often not considered as substrate/product of a reaction. However I haven't found any KEGG documentation that describes why and how are these compounds identified/removed. Do you people know any source that explains this process?
I ran into the same problem before and the only solution I can think of at the moment is to filter by size at first (i.e. anything that is smaller than 60 Da such as metals, Cl- and such) and then filter by number of associations (anything that has greater than 100 associations across all reaction must be a currency metabolites such as ATP, NADH and so on). If anyone can think of a better way it would be great so that there is a established set of currency metabolites.
I don't really know about how KEGG does it, but on WikiPathways it usually works the other way around. People add the main reactions and main metabolites simply because that is easier and gives a better overview. They can however add any metabolite they wish which indeed sometimes lead to inconsistencies.
Adding small ubiquitous metabolites sometimes is a problem though. That depends a bit on what you use the pathways for.
If you use the pathway for gene related analysis (the most typical use case) it doesn't really matter
If you use it for metabolomics data analysis, including pathway statistics, having these small metabolites, sometimes more often for the same metabolite and especially the inconsistency can cause some problems. But since pathway statistics is usually done for metabolites that were really measured that effect will not be large. It will mostly result in a lower coverage of pathway content by the analysis method. That will have a artificial bad reflection on the quality of the measurement method used.
For network analysis the main problem occurs when these small metabolites are included and occur more than once in the same network. That however can be easily solved by merging nodes. The WikiPathways app for Cytoscape for instance does that automatically.
Finally if you want to use the pathways to describe the actual reaction model including the stoichiometry completeness of course is an example. But up to now that is not what pathways are used for mostly.
I could imagine that some analytical plugins in PathVisio, Cytoscape and some statistical methods in R could benefit from filtering pathways for small, ubiquitous metabolites. I would think that a manually created adaptable blacklist would work best for that.