Accessing UNIPROT using REST API
3
0
Entering edit mode
7.0 years ago
Natasha ▴ 40

Hello Everyone, I would like to programmatically access the entries(uniprot id,entry name,protein name,gene name,kinetics) for a given EC Number and organism of interest, using python.

import urllib,urllib2

url = 'http://www.uniprot.org/uploadlists/'

params = {
'from':'ACC',
'to':'P_REFSEQ_AC',
'format':'tab',
'query':'P13368 P20806 Q9UM73 P97793 Q17192'
}

data = urllib.urlencode(params)
request = urllib2.Request(url, data)
contact = "" # Please set your email address here to help us debug in case of problems.
request.add_header('User-Agent', 'Python %s' % contact)
response = urllib2.urlopen(request)
page = response.read(200000)

I had a look at the above python code given here.However,I couldn't really understand how the code should be modified to download the search result(here) in xml format .

In the above code my query is ------'query':'3.1.3.9 2.7.1.2' and format is -----"format": 'xml' How do we add the organism filter("Organism":'Homo sapiens') in the code and download the xml file of the serach result?

Many thanks,

Deepa

programmatic access python REST API UNIPROT • 10k views
ADD COMMENT
2
Entering edit mode

The UniProt IDmapping doesn't actually support EC numbers. For performance reasons, databases where the mapping relationship to UniProtKB identifiers is one-to-many, e.g. GO, InterPro or PubMed, are not supported. There is a note about this in the help page http://www.uniprot.org/help/uploadlists.

You can however build RESTful queries of the form

http://www.uniprot.org/uniprot/?query=(ec%3A+3.1.3.9+or+ec%3A2.7.1.2)+organism%3A9606&format=xml

You could also use the tab-delimited format:

http://www.uniprot.org/uniprot/?query=(ec%3A+3.1.3.9+or+ec%3A2.7.1.2)+organism%3A9606&format=tab&columns=id,entry_name,protein_names,genes,comment(KINETICS)

ADD REPLY
0
Entering edit mode

This solution no longer works as noted by @roder.thomas.

Elisabeth Gasteiger - Is there an update that can be posted instead? Otherwise this answer should be moved to a comment for historical reference.

Note: A new answer has been added so this originally accepted answer has been moved to a comment for reference. It is not longer valid.

ADD REPLY
3
Entering edit mode
2.2 years ago
Wayne ★ 2.1k

Summer 2022, there's a Python package for querying UniProt's new REST API, by Michael Milton(multimeric), called Unipressed.

Announcement:

Unipressed Github repo.
Unipressed documentation.

Demonstration Code Using Unipressed (consistent with examples in earlier posts):

from unipressed import UniprotkbClient

for record in UniprotkbClient.search(
    query={
        "or_": [
        {"ec": "3.1.3.9"},
        {"ec": "2.7.1.2"},
        ],
        "and_": [
        {"organism_id": "9606"},
        ]
    },
    #fields=["length", "gene_names"]
).each_record():
    display(record)

The documentation for Unipressed, presently under 'Advantages' it says it supports formats json, tsv, list, and xml:

Here is choosing tsv format:

from unipressed import UniprotkbClient

for record in UniprotkbClient.search(
    query={
        "or_": [
        {"ec": "3.1.3.9"},
        {"ec": "2.7.1.2"},
        ],
        "and_": [
        {"organism_id": "9606"},
        ]
    },
    format="tsv",
    fields=["accession","gene_names", "length"]
).each_record():
    display(record)

That results in:

{'Entry': 'Q9NQR9', 'Gene Names': 'G6PC2 IGRP', 'Length': '355'}
{'Entry': 'P35575', 'Gene Names': 'G6PC1 G6PC G6PT', 'Length': '357'}
{'Entry': 'Q9BUM1', 'Gene Names': 'G6PC3 UGRP', 'Length': '346'}
{'Entry': 'P35575-2', 'Gene Names': 'G6PC1 G6PC G6PT', 'Length': '176'}
{'Entry': 'Q9NQR9-2', 'Gene Names': 'G6PC2 IGRP', 'Length': '102'}
{'Entry': 'Q9NQR9-3', 'Gene Names': 'G6PC2 IGRP', 'Length': '154'}
{'Entry': 'A0A024R1U9', 'Gene Names': 'G6PC hCG_16953', 'Length': '359'}

(I went with a very simple form of the output there to show human readable results here. To actually save data as the TSV-formatted text, you can adapt the approach used at the end of Michael Milton's (multimeric) reply to this post below, as I do with the above example code here.)

This gives seven hits as opposed to the four shown in the direct results at the site in the August 31, 2022 post by @roder.thomas. This is because this query results include the isoforms in the primary accessions of hits, and so in addition to the four shown in the August 31, 2022 post by @roder.thomas:

Q9NQR9
P35575
Q9BUM1
A0A024R1U9

You also see listed:

P35575-2
Q9NQR9-2
Q9NQR9-3

Those isoforms are listed under the section 'Sequence & Isoforms' in the entry pages accessible from the screen in the August 31, 2022 post by @roder.thomas.

You can filter those isoforms to get the 4 seen in the direct access by filtering out any where there's a dash in in the name, like so:

from unipressed import UniprotkbClient

collected=[]
for record in UniprotkbClient.search(
    query={
        "or_": [
        {"ec": "3.1.3.9"},
        {"ec": "2.7.1.2"},
        ],
        "and_": [
        {"organism_id": "9606"},
        ]
    },
    fields=["length", "gene_names"]
).each_record():
    collected.append(record)
collected = [x for x in collected if "-" not in x["primaryAccession"]]

XML Format Example:

The original post in particular asked about downloading the results in XML format. And Unipressed has that built in already. Here some accessing & printing of data stored in the XML record object is done to show something human readable:

from unipressed import UniprotkbClient

for record in UniprotkbClient.search(
    query={
        "or_": [
        {"ec": "3.1.3.9"},
        {"ec": "2.7.1.2"},
        ],
        "and_": [
        {"organism_id": "9606"},
        ]
    },
    format="xml",
).each_record():
    #Show XML object as string by uncommenting out the next two lines & deleting everything after those lines
    #from xml.etree import ElementTree # from https://stackoverflow.com/a/48671499/8508004
    #print(ElementTree.tostring(record, encoding='unicode'))
    #Below based on [Processing XML in Python — ElementTree:A Beginner’s Guide](https://towardsdatascience.com/processing-xml-in-python-elementtree-c8992941efd2)
    # slice `[28:]` added to remove `{http://uniprot.org/uniprot}` from the front of tags
    #[print(elem.tag[28:]) for elem in record.iter()]
    #[print(child.tag, child.attrib) for child in record]
    [print(elem.tag[28:], elem.attrib, elem.text) for elem in record.iter('{http://uniprot.org/uniprot}fullName')]
    [print(elem.tag[28:], elem.attrib, elem.text) for elem in record.iter('{http://uniprot.org/uniprot}ecNumber')]
    [print(elem.tag[28:], elem.attrib) for elem in record.iter('{http://uniprot.org/uniprot}proteinExistence')]
    print("*"*60)

Results in:

fullName {} Glucose-6-phosphatase 2
fullName {} Islet-specific glucose-6-phosphatase catalytic subunit-related protein
ecNumber {} 3.1.3.9
proteinExistence {'type': 'evidence at protein level'}
************************************************************
fullName {'evidence': '36'} Glucose-6-phosphatase catalytic subunit 1
fullName {} Glucose-6-phosphatase
fullName {} Glucose-6-phosphatase alpha
ecNumber {'evidence': '9 12 16 25'} 3.1.3.9
proteinExistence {'type': 'evidence at protein level'}
************************************************************
fullName {} Glucose-6-phosphatase 3
fullName {} Glucose-6-phosphatase beta
fullName {} Ubiquitous glucose-6-phosphatase catalytic subunit-related protein
ecNumber {} 3.1.3.9
proteinExistence {'type': 'evidence at protein level'}
************************************************************
fullName {'evidence': '5'} Isoform 2 of Glucose-6-phosphatase catalytic subunit 1
fullName {} Glucose-6-phosphatase
fullName {} Glucose-6-phosphatase alpha
ecNumber {'evidence': '1 2 3 4'} 3.1.3.9
proteinExistence {'type': 'evidence at protein level'}
************************************************************
fullName {} Isoform 2 of Glucose-6-phosphatase 2
fullName {} Islet-specific glucose-6-phosphatase catalytic subunit-related protein
ecNumber {} 3.1.3.9
proteinExistence {'type': 'evidence at protein level'}
************************************************************
fullName {} Isoform 3 of Glucose-6-phosphatase 2
fullName {} Islet-specific glucose-6-phosphatase catalytic subunit-related protein
ecNumber {} 3.1.3.9
proteinExistence {'type': 'evidence at protein level'}
************************************************************
fullName {'evidence': '4'} Glucose-6-phosphatase
ecNumber {'evidence': '4'} 3.1.3.9
proteinExistence {'type': 'inferred from homology'}
************************************************************
ADD COMMENT
1
Entering edit mode

That is absolutely awesome, thank you for sharing!

ADD REPLY
1
Entering edit mode

Incidentally, if you want the XML data so you can save it to .xml you can use the each_page method which returns a file object:

from unipressed import UniprotkbClient
import shutil

for i, page in enumerate(UniprotkbClient.search(
    query={
        "or_": [
            {"ec": "3.1.3.9"},
            {"ec": "2.7.1.2"},
        ],
        "and_": [
            {"organism_id": "9606"},
        ]
    },
    format="xml",
).each_page()):
    with open(f"{i}.xml", "w") as dest:
        shutil.copyfileobj(page, dest)
ADD REPLY
1
Entering edit mode
2.2 years ago

Unfortunately the REST API underwent considerable modifications along with the recent website redesign.

In the new query syntax, the query would be

((ec:3.1.3.9) OR (ec:2.7.1.2)) AND (organism_id:9606)

https://www.uniprot.org/uniprotkb?query=%28%28ec%3A3.1.3.9%29%20OR%20%28ec%3A2.7.1.2%29%29%20AND%20%28organism_id%3A9606%29

If you want the results corresponding to this query in XML format, you can indeed use the above-mentioned "Generate URL for API" link which will show the following:

API URL using the streaming endpoint

This endpoint is resource-heavy but will return all requested results.

https://rest.uniprot.org/uniprotkb/stream?compressed=true&format=fasta&query=%28%28%28ec%3A3.1.3.9%29%20OR%20%28ec%3A2.7.1.2%29%29%20AND%20%28organism_id%3A9606%29%29

API URL using the search endpoint

This endpoint is lighter and returns chunks of 500 at a time and requires pagination.

https://rest.uniprot.org/uniprotkb/search?compressed=true&format=fasta&query=%28%28%28ec%3A3.1.3.9%29%20OR%20%28ec%3A2.7.1.2%29%29%20AND%20%28organism_id%3A9606%29%29&size=500
ADD COMMENT
0
Entering edit mode
2.2 years ago
roder.thomas ▴ 30

These pages do not work anymore. But UniProt included a API query generator to the website!

how to generate API query

ADD COMMENT

Login before adding your answer.

Traffic: 1599 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6