I have a proteomic data which has protein names and I want to overlay with RNA-seq data based on gene ids (enterz gene id or ensemble id etc) I looked at biomart, David, Id converter all of them are returning very few hits or none. Is there a good tool out there where I can convert protein names to gene ids.
Thanks
Which organism? And can you post an example of a "protein name".
It is mouse few protein names are - RAB7A TBB2A KPYR NDKA
It is strange not to get results with Biomart ? Be careful that for Mus Musculus, gene symbols are usually in lower case (Tnf instead of TNF) and maybe the search is case-sensitive. By the way, I tried your 4 protein name and found no answer in MGI symbol. Uniprot Gene name found 1 / ?? If it is proteomic data, maybe you have the uniprot accession IDs ? (something like "Q8K2Q7") and it might be easier to use that to get ensembl IDs via biomart ? Julien
No this is exactly proteomic core gave one mistake I made is names are as CPSMMOUSE HBB1MOUSE ALBUMOUSE GSTM1MOUSE FTHFD_MOUSE
If that makes any difference it says it is from Sprot_54.0 database
Those look like they are UniProtKB entry names:
The other ones you mention are missing the species suffix, but can also be found:
Unfortunately UniProtKB entry names are unstable, so while you may be able to find most of them in UniProtKB without any problems, some will have changed and will be a little harder to find (e.g. FTHFD_MOUSE). This is why UniProt recommends the use of accession numbers, over the more human friendly entry names.
Luckily you know which version of UniProtKB/Swiss-Prot these came from (Sprot_54.0 => UniProtKB/Swiss-Prot 54.0 (2007-07-24)) which means you can use the UniProtKB Sequence/Annotation Version Archive (UniSave) to resolve the entry names to the the specific UniProtKB entries that were used to generate the annotations, and from these entries get the UniProtKB accessions and the associated gene names. Given those then mapping to Entrez Gene and/or Ensembl is simple and can be done using direct queries in those resources, or mapping services such as the UniProt Database identifier mapping service.