Question

Cross-referencing multiple databses with PDB

0

Entering edit mode

10 months ago

Mariana ▴ 50

Hello!

I have a csv file which comprises a database of hundreds of proteins, with various details about each protein. One of the columns contains the accession codes for the proteins, but the problem is that those codes are not standardized: for some it's the Uniprot id, for others the Genbank id, others the RefSeq id and, for very few, the PDB id. My goal is to get the PDB ids for all proteins in my database, if available.

I know Uniprot has a good online tool for id mapping and EMBL-EBI search could also be useful. Crossing Uniprot with PDB is likely easy (through the rest api, for example) but I would like to know, is there is a way of doing this cross-references for Genbank and Refseq to PDB programmatically? I understand that I might need to cross them with Uniprot first and then PDB, but I would like some suggestions.

Thank you so much in advance!

genbank PDB uniprot embl-ebi refseq • 490 views

ADD COMMENT • link updated 10 months ago by Mensur Dlakic ★ 28k • written 10 months ago by Mariana ▴ 50

0

Entering edit mode

for some it's the Uniprot id, for others the Genbank id, others the RefSeq id and, for very few, the PDB id.

You realize that this will be a challenge. There may not always be 1-to1 relationship between these ID's.

ADD REPLY • link 10 months ago by GenoMax 147k

score 0 · Answer 1 · 2024-01-17

There is a file named pdb_seqres.txt here that contains sequences of all structurally solved proteins in PDB. It is updated weekly, so fairly current. Assuming that you have sequences for your proteins of interest, you can do a simple BLASTP search against this database. It should tell you if the protein has been solved, or if there are sufficiently close relatives with a known structure.