I've been using HUMAnN2 to go from human-removed microbiome shotgun metagenomic reads to HUMAnN2 attribute vectors with coverage and abundance values. The resulting attributes have identifiers that have the following structure: HISDEG-PWY: L-histidine degradation I|g__Streptococcus.s__Streptococcus_sanguinis
where Streptococcus sanguinis is the taxonomic unit and HISDEG-PWY is the metabolic unit.
My question is how actually two separate questions:
(1) Is there a a database available through MetaCyc that has orthologs where one can blast against like KEGG orthologs? Which one would I download for this type of functionality.
From the url: https://metacyc.org/download.shtml
We provide the BioCyc databases (such as EcoCyc and MetaCyc) as collections of data files in several alternative formats including the following.
BioPAX format
Pathway Tools attribute-value format
Pathway Tools tabular format
SBML format
Gene Ontology annotations (EcoCyc only)
(2) How does HUMAnN2 go from read -> MetaCyc pathway & read -> species?
I know they use metaphlan2
in the backend which is how they get the species but how do they know which MetaCyc
pathway to assign the protein?