I give picrust2 a list of 16s ASVs that map to biota at various taxanomic levels. Picrust2 creates a table of inferred per-sample abundances of KEGG genome orthologs (KO) and one of KEGG enzyme classifications (EC). The main output is a list of per-sample inferred pathway abundances that seems to take both of these into account.
Could I have a basic description of how this works and why? I did RTFM, but I'm inexperienced with pathway analysis and totally green in the context of the microbiome. So, while I can recite the steps picrust2 uses to go from ASVs to pathways, I humbly admit that I don't really understand what I'm doing or what this information ultimately means.
Which part of the process are you having problem with? Taxa -> KO or KO -> pathways?
Both, I think. Going from Taxa -> KO, am I ending up with all possible gene orthologs in that specific organism, or is it the orthologs represented across all the samples in the pool? Maybe that is the difference between "stratified" and "unstratified?" Then, is it just KO -> pathway, or is it KO + EC -> pathway. Otherwise, what is EC for?
KEGG pathways contain more than enzymes and are built using KOs. I guess EC numbers are there for you if you want to use them but not used to construct pathways. Although EC<->KO is pretty tight and should lead to same pathways.
After reading more docs, it turns out the picrust2 ONLY uses the EC numbers and gets the pathways from the MetaCyt database. It gives me the KO numbers, but does not match them with KEGG pathways.
Thanks for the update, good to know