I have a list of Ensembl protein IDs (ENSP) that I need to convert to Entrez-formatted gene symbols. So far, I haven't identified a straightforward method to convert between these two formats, as I'm not seeing a platform that will permit this. This is my current tentative strategy:
Step 1: Convert ENSP protein IDs to HGNC gene symbols via the R package EnsDb.Hsapiens.v86
Step 2: Convert HGNC gene symbols to UniProtKB format via the UniProt Protein Conversion tool ( https://www.uniprot.org/uploadlists/ ). For some reason, UniProtKB is the only format that is available when converting from HGNC format.
Step 3: Convert UniProt KB protein IDs to Entrez Gene ID Numbers via the UniProt Protein Conversion tool ( https://www.uniprot.org/uploadlists/ ); this platform offers conversion to Entrez gene ID numbers, but not Entrez gene symbols...
Step 4: Convert Entrez Gene ID Numbers to Entrez Gene Symbols via the R package org.Hs.eg.db, with reference to this thread: Gene symbol convert to Entrez ID
Strategies reviewed:
I reviewed the biomaRt platform, but am not seeing relevant ID conversion tools (e.g. going to http://biomart.org/ --> Tools --> ID Conversion takes me to a general notice that the community portal is unavailable).
Referencing this related thread: Make List Of All Human Gene Ids (Ens, Hgnc, Entrez) To Ease Conversion Of Ids
...The International Protein Index (IPI) platform provides an ipi.HUMAN.xrefs file: ftp://ftp.ebi.ac.uk/pub/databases/IPI/last_release/current/
...with the initial content:
Protein cross-references file for IPI human release 3.87
SP A0A183 IPI00807623 ENSP00000411070; VALIDATED:NP_001122072; HIT000394684; ABJ55982; 31824,LCE6A; 448835,LCE6A; UPI0000D83229 Hs.62927; CCDS44227.1; GI:190610047; OTTHUMP00000210240;
However, the columns in this file aren't labeled, and I don't know the format of each column, or whether Entrez-format gene symbols are present. The ReadMe file does not provide this information.
With reference to this thread: Gene Id Conversion Tool
...I tried to use DAVID: http://david.abcc.ncifcrf.gov/conversion.jsp ... but it doesn't seem to recognize ENSP-formatted inputs, as testing examples generates error messages.
bioDBnet ( https://biodbnet-abcc.ncifcrf.gov/db/db2db.php ) doesn't permit ENSP conversion to Entrez format.
The biodb.jp Hyperlink Management System ( http://biodb.jp/ ) has tools related to Ensemble Protein IDs, but I don't see tools for converting to Entrez format.
If there is any way to simplify the intended 4-step processing strategy described above, I will appreciate any suggestions. Thanks in advance for your input.
Why not just download the mapping from Biomart? That'd be a single step and vastly simpler.