Mapping PDB ID + chain ID to UniProt ID
3
0
Entering edit mode
2.2 years ago
johnnytam100 ▴ 110

This question was addressed here.

Unfortunately, including bioservices, the solutions pypdb and map_pdb_to_uniprot do not work for now.

Besides, I do not need residue level mapping as one of the suggested solution.

Does the failure of known methods have anything to deal with the recent update of the UniProt database?

Please let me know if you know a workaround.

Thank you!

uniprot pdb • 4.6k views
ADD COMMENT
2
Entering edit mode
2.1 years ago
Jiyao Wang ▴ 380

You can use an API from SIFT and parse the JSON output, e.g., https://www.ebi.ac.uk/pdbe/api/mappings/uniprot/1kq2

ADD COMMENT
0
Entering edit mode

Thanks Wang! I have now written a Python script based on your method. Mapping PDB ID + chain ID to UniProt ID

ADD REPLY
2
Entering edit mode
2.1 years ago
johnnytam100 ▴ 110

Based on this suggestion by Wang

I have now written a pdb2uniprot Python script: https://github.com/johnnytam100/pdb2uniprot

Usage examples

1) csv

python pdb2uniprot_tam.py --input pdb_chain_table.csv --pdb_col PDB_ID --chain_col CHAIN_ID

2) tab-delimited table (with header in output)

python pdb2uniprot_tam.py --input pdb_chain_table

3) tab-delimited table (no header in output)

 python pdb2uniprot_tam.py --input pdb_chain_table --no_header
ADD COMMENT
2
Entering edit mode
2.1 years ago
Wayne ★ 2.1k

Issue with pypdb route

The pypdb route fails because the API pypdb access changed that it uses so that the information no longer includes chain IDs and apparently not the accession.

You can show that by running the following from one of the answers there and seeing the output is very different now:

import pypdb

all_info = pypdb.get_all_info('1kf6')
print(all_info)

What you want to do has a number of current routes that will work right now.

An option:

PDBrenum (GitHub repo & associated 2021 publication) uses the SIFTS data referenced in some answers to the question your referenced (and in an answer by Jiyao Wang here) to renumber chains in PDB files to match the UniProt entries. Among the output PDBrenum generates is a table specifically mapping each chain ID in the PDB file to a UniProt ID. That table is found in the file named log_corrected.txt in the working directory after the process completes when using the version of PDB renum currently here. You can follow my demo page found in sessions launched here to point it at a PDB id and then see that the file log_corrected.txt it will make will include the mapping summarized at the chain level. You can easily read the information in log_corrected.txt back in to Python objects using Pandas with the following code:

import pandas as pd
df = pd.read_fwf("log_corrected.txt", ) # based on https://stackoverflow.com/a/41509522/8508004
# If you prefer each row as a dictionary
df_dict = df.to_dict(orient='records')

To get to the demo page, click on the 'Launch a PDBrenum demo in your browser via MyBinder'. A session will spin up and the demo will open. You can change the first command that runs to %run PDBrenum.py -rfla 1kf6 -PDB in order to run the example used on the older page you reference about this conversion. You should see log_corrected.txt among the products made and you can run that code block above in a Jupyter cell and examine the dataframe or dictionary it will make.

On that page you referenced, some people had pointed out that SIFTS information that PDBrenum is based on; however, PDBrenum parses the data using code published with a paper, and so it is probably a much more reliable way to get the information than the other ad hoc SIFTS-based examples presently highlighted on that page.

UPDATE: I have now made a notebook demonstrating using that information to map chain identifiers in PDB files to the corresponding UniProt ids. It uses johnnytam10's repository and script as the demo input and includes running that code as part of the preparation & so the use of each can be compared/contrasted. You can launch directly into this notebook in an active Jupyter session served by MyBinder by clicking here.
That demonstration thus gives those interested a number of options.
Static view of the demo.

Another option if you didn't care about per chain basis & just wanted the UniProt IDs for a PDB code:

The Unipressed package by Michael Milton (multimeric) aids in accessing the current UniProt API access. You can try it by installing it by running %pip install unipressed in a Jupyter notebook inside the session I already directed you to above.
What you want is basically the reverse of the example where I used it here to go from UniProt identifier to PDB code.

The basic code of the reverse is:

import time
from unipressed import IdMappingClient
request = IdMappingClient.submit(
    source="PDB", dest="UniProtKB", ids={"1kf6"}
)
time.sleep(1)
list(request.each_result())

However, you want the chain ID mapping, too. Looking at what the direct query at UniProt has for the example one using the following, it doesn't seem obvious to me the UniProt keeps that chain specific information:

from unipressed import UniprotkbClient

for record in UniprotkbClient.search(
    query={"xref": "pdb-1kf6"},
    #fields=["length", "gene_names"]
).each_record():
    display(record)

However, maybe I'm missing an association in the giant list of details in the record.

ADD COMMENT
0
Entering edit mode

Thank you Wayne for the detailed method! I had trouble with finding the log_corrected.txt file. So I finally adopted the method by Wang.

ADD REPLY
0
Entering edit mode

I updated my main demonstration notebook about PDBrenum to better highlight the location and the data contained in the file log_corrected.txt, that is created as a by-product of the PDBrenum process.

I also added a separate notebook demonstrating using that information to map chain identifiers in PDB files to the corresponding UniProt ids. It uses your repository and script as the demo input and includes running your code as part of the preparation & so the use of each can be compared. You can go directly to this notebook in an active Jupyter session served by MyBinder by clicking here. You can view it in static form here.

ADD REPLY

Login before adding your answer.

Traffic: 2831 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6