Hello,
I have a list of protein ID numbers of a clade of Bacteria (obtained from metagenome), and I am trying to uncover the metabolism of that clade: hererotrophic or not, what pathways are used, presence/absence of respiration/fermentation etc. How can I do that? Is there an online tool or software that would make this analysis easier?
Thanks you in advance!
I believe what you are trying to do is an enrichment analysis. There are a multitude of similar tools to do this, but I personally use http://www.webgestalt.org/ . There may be better tools for enrichment analysis of prokaryotic genes/proteins.
Does not appear to be the case based on the original question. OP simply wants to know what pathways are represented by the list of protein ID's.
zoe.meziere : Can you post a few examples of protein ID's? Are these one of the standard database ID (e.g. UniProt)?
Yes sure, here are a few examples:
OFW53612.1 WP_090599220.1 OGD16911.1 OGD16900.1 PKP54795.1 PKP54793.1 WP_090721821.1 PKP58768.1 PKP59194.1 OGD17541.1
This is EMBL/GenBank/DDBJ CDS identifiers
OFW53612.1 OGD16911.1 OGD16900.1 PKP54795.1 PKP54793.1 PKP58768.1 PKP59194.1 OGD17541.1 are indeed EMBL/GenBank/DDBJ CDS identifiers, and I can map them without any problem. Maybe the service wasn't functional when you tries? Can you do it now?
WP_090599220.1 and WP_090721821.1 are RefSeq Protein identifiers. When you try to map them to UniProt, the service tells you that they can only be mapped to UniParc.
Thanks! I'm having trouble converting my protein IDs into KEGG IDs... They are NCBI IDs, but that database is not in the conversion tool of UniProt...
What type of NCBI IDs do you have? The UniProt IDmapping tool accepts EMBL/GenBank/DDBJ identifiers, EMBL/GenBank/DDBJ CDS (protein_ids), RefSeq and GeneID (Entrez Gene) identifiers.
I actually managed to get the Uniprot IDs but I can't retrieve the KEGG IDs. I tried both the Uniprot and the KEGG tools... do you know why it doesn't work?