Question

Does MetaCyc or BioCyc have an equivalent of KEGG orthologs that could be blasted against locally?

0

Entering edit mode

6.4 years ago

O.rka ▴ 740

I've been using HUMAnN2 to go from human-removed microbiome shotgun metagenomic reads to HUMAnN2 attribute vectors with coverage and abundance values. The resulting attributes have identifiers that have the following structure: HISDEG-PWY: L-histidine degradation I|g__Streptococcus.s__Streptococcus_sanguinis where Streptococcus sanguinis is the taxonomic unit and HISDEG-PWY is the metabolic unit.

My question is how actually two separate questions:

(1) Is there a a database available through MetaCyc that has orthologs where one can blast against like KEGG orthologs? Which one would I download for this type of functionality.

From the url: https://metacyc.org/download.shtml

We provide the BioCyc databases (such as EcoCyc and MetaCyc) as collections of data files in several alternative formats including the following.

BioPAX format

Pathway Tools attribute-value format

Pathway Tools tabular format

SBML format

Gene Ontology annotations (EcoCyc only)

(2) How does HUMAnN2 go from read -> MetaCyc pathway & read -> species?

I know they use metaphlan2 in the backend which is how they get the species but how do they know which MetaCyc pathway to assign the protein?

metacyc biocyc humann2 orthologs metagenomics • 3.4k views

ADD COMMENT • link updated 6.4 years ago by biouser ▴ 30 • written 6.4 years ago by O.rka ▴ 740

score 0 · Answer 1 · 2018-12-18

For the 1st question according to their webpage: (https://bitbucket.org/biobakery/humann2/wiki/Home) *UniRef database provides gene family definitions *MetaCyc provides pathway definitions by gene family

It seems like they map to Uniprot, from there they get the "gene family" name and then they obtain the MetaCyc annotation. Let's say you already mapped to UniProt and get the protein id "P01189" (https://www.uniprot.org/uniprot/P01189) if you get the gene name from there "Name:POMC" you can easily obtain its description in the humancyc (instead of metacyc because for this example is a human protein): https://biocyc.org/gene?orgid=HUMAN&id=ENSG00000115138-MONOMER

I believe it works like that, so the way to go would be to map/blast to UniProt first I guess, thats my opinion at least.

You can download the database of Uniref50 for instance by following their tutorial: https://bitbucket.org/biobakery/humann2/wiki/Home#markdown-header-download-a-translated-search-database

Then you get the "uniref50_annotated.1.1.dmnd" and you can convert it to "fasta": diamond getseq -d uniref50_annotated.1.1.dmnd

Or downloading the fasta file directly from UniProt: https://www.uniprot.org/downloads

Anyway to have a more accurate answer you could also put your question in humann forum as well: https://groups.google.com/forum/#!forum/humann-users

Greetings