Entering edit mode
8 months ago
O.rka
▴
740
I’m looking for the identifier mapping tables used in the backend for HUMAnN.
More specifically, the following if they are available:
UniRef50 -> EC UniRef50(or EC) -> KEGG KO EC -> MetaCyc pathway UniRef50 (EC or KO) -> KEGG pathway
I found some files here: site-packages/humann/data/pathways/
I have a few questions:
- Is it expected for one UniRef50 ID to map to more than one EC in some cases?
UniRef50_G6EMD2 {5.4.2.6, 2.7.1.41}
UniRef50_Q1J7L4 {5.4.2.6, 2.7.1.41}
UniRef50_T0UKK6 {5.4.2.6, 2.7.1.41}
UniRef50_X5NX36 {5.4.2.6, 2.7.1.41}
How is this handled in the backend? Does a UniRef50 hit for these count towards both or only one?
- I got ID mappings between pathways and rxns from
data/pathways/metacyc_pathways
. Many of these rxns do not have ECs and are not present indata/pathways/metacyc_reactions_level4ec_only.uniref.bz2
. For example, the following:
list(pwy_to_rxns["PWY-2681"])
# ['RXN-4308',
# 'RXN-4305',
# 'RXN-4317',
# 'RXN-4310',
# 'RXN-4306',
# 'RXN-4314',
# 'RXN-4303',
# 'RXN-4307',
# 'RXN-4313',
# 'RXN-4312',
# 'RXN-4304']
pd.Series(rxn_to_ec)[list(pwy_to_rxns["PWY-2681"])]
# RXN-4308 {}
# RXN-4305 {2.5.1.112}
# RXN-4317 {}
# RXN-4310 {}
# RXN-4306 {}
# RXN-4314 {}
# RXN-4303 {2.5.1.112}
# RXN-4307 {2.5.1.27}
# RXN-4313 {}
# RXN-4312 {}
# RXN-4304 {}
The following to confirm:
grep "RXN-4308" metacyc_reactions_level4ec_only.uniref
Are there supposed to be ECs associated with some of the rxns here since it's in the pathway?
Those two entry appear to be
dismutases
and that word seems to refer to many reactions:https://en.wikipedia.org/wiki/DisproportionationThat would make a lot of sense. In that instance, I'm less concerned because after a closer look I realized that's only the case for 4 UniRef50 ids and it's the same 2 EC for all 4. Do you have any insight on teethe rxns missing ECs by any chance?