UniProt references to SwissProt references?
2
0
Entering edit mode
7.0 years ago

Hello there

I have some identifiers coming from a search against the Swissprot database that have the following structure (example 1): SYDND_PSEFS

And I want them to be in the UniProt format, just like the following (example 2):

I4L7P1_9PSED

But I am not able to achieve it through the RetrieveID/ mapping tool, since I do not know the name assigned to the Swissprot database. If doing it the other way (from UniProt to Swissprot) is possible, I am also very interested in how to do it.

Thanks a lot

uniprot swissprot metaproteomics database • 2.7k views
ADD COMMENT
2
Entering edit mode
7.0 years ago
mobiusklein ▴ 180

Uniprot's HTTP API is very accommodating about translating identifiers.

Requesting http://www.uniprot.org/uniprot/SYDND_PSEFS will be redirected to http://www.uniprot.org/uniprot/C3JYT1. If you're comfortable with Python, you could use the following approach:

from lxml import etree

uri_template = "http://www.uniprot.org/uniprot/{0}.xml"
nsmap = {"up": "http://uniprot.org/uniprot"}

your_ids = # load your id list here
translated = []

for swiss_id in your_ids:
    tree = etree.parse(uri_template.format(swiss_id)
    names = [el.text for el in tree.findall(
        ".//up:protein/*/up:fullName", nsmap)]
    recommended_name_tag = tree.find(
        ".//up:protein/*/up:recommendedName", nsmap)
    if recommended_name_tag is not None:
        if recommended_name_tag.text.strip():
            recommended_name = recommended_name_tag.text.strip()
        else:
            recommended_name = ' '.join(c.text for c in recommended_name_tag)
    else:
        try:
            recommended_name = names[0]
        except IndexError:
            recommended_name = ""
    gene_name_tag = tree.find(".//up:entry/up:name", nsmap)
    if gene_name_tag is not None:
        gene_name = gene_name_tag.text
    else:
        gene_name = ""

    translated.append((names, recommended_nam, gene_name))

This will collect all the names that UniProt has for that symbol and store them in the list translated, you can then iterate over you_ids and translated in parallel with zip and decide which identifier to retain.

ADD COMMENT
1
Entering edit mode
7.0 years ago

First of all, a short note on terminology.

The UniProt Knowledgebase (UniProtKB) consists of 2 section: UniProtKB/Swiss-Prot for reviewed entries and UniProtKB/TrEMBL for unreviewed entries (see http://www.uniprot.org/help/uniprotkb_sections, http://www.uniprot.org/help/entry_status).

Since Swiss-Prot is part of UniProtKB, it does not make sense to map from Swiss-Prot to UniProtKB. If an entry is in UniProtKB/Swiss-Prot, it has been reviewed, while a UniProtKB/TrEMBL entry is not reviewed, but in both cases, entries have a UniProtKB identifier (accession number and entry name).

However, if your goal is to map from entry name to accession number, you can indeed use the IDmapping tool http://www.uniprot.org/uploadlists, map from UniProtKB to UniProtKB, and then download the results in "List" format. Or you can use our REST API to map the identifiers one at a time, with an URL of the form

http://www.uniprot.org/uniprot/?query=SYDND_PSEFS&format=list

ADD COMMENT

Login before adding your answer.

Traffic: 1724 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6