Hello!
I have a csv file which comprises a database of hundreds of proteins, with various details about each protein. One of the columns contains the accession codes for the proteins, but the problem is that those codes are not standardized: for some it's the Uniprot id, for others the Genbank id, others the RefSeq id and, for very few, the PDB id. My goal is to get the PDB ids for all proteins in my database, if available.
I know Uniprot has a good online tool for id mapping and EMBL-EBI search could also be useful. Crossing Uniprot with PDB is likely easy (through the rest api, for example) but I would like to know, is there is a way of doing this cross-references for Genbank and Refseq to PDB programmatically? I understand that I might need to cross them with Uniprot first and then PDB, but I would like some suggestions.
Thank you so much in advance!
You realize that this will be a challenge. There may not always be 1-to1 relationship between these ID's.