Converting between drug identifier formats
3
0
Entering edit mode
7.2 years ago

I have two CSV files of drug-related data. One has the drug info specified with CHEMBL identifiers, whereas the second file contains DrugBank and PubChem IDs. I need to compare these two files for overlap in their drug contents. Both files contain drug names in string format, but working with those is tricky, since often a single row/drug will contain several synonyms, and accurately matching between the two files seems like it will be challenging, especially since both files are unlikely to contain the same synonyms for a particular drug.

I'm looking for a simple way (e.g. an existing function or website) that will allow me to convert between my CHEMBL IDs in the first file, and my DrugBank & PubChem IDs in the second file. I have performed a fairly extensive search, but am surprised that I'm not finding e.g. an R or Python function, or a web-based tool, that would allow me to do this. [This site is similar to what I need, with lots of options for the "From" format, but unfortunately, no useful options for the "To" format: http://cts.fiehnlab.ucdavis.edu/conversion/batch ]. I also located this Jupyter Notebook (http://nbviewer.jupyter.org/url/git.dhimmel.com/drugbank/unichem-map.ipynb) to match DrugBank compounds to external resources using UniChem, but for my purposes, this Notebook seems far too complex for the simple conversion I'm seeking.

Any suggestions about resources that might assist with this drug ID conversion task will be much appreciated. Thanks!!

conversion database drug • 8.7k views
ADD COMMENT
3
Entering edit mode
7.2 years ago
Zhilong Jia ★ 2.2k

Convert the PubChem IDs to CHEMBL IDs (In the Output IDs section, choose Registry IDs - CHCMBL.) via https://pubchem.ncbi.nlm.nih.gov/idexchange/idexchange.cgi

ADD COMMENT
1
Entering edit mode

Many thanks, Zhilong! That is exactly what I needed!

ADD REPLY
1
Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLY
1
Entering edit mode
7.2 years ago

This is easily done with the Cactvs Cheminformatics Toolkit (visit www.xemistry.com/academic for free academic packages, it includes both a loadable Python module and a stand-alone Python interpreter with chemistry extensions). The toolkit can decode the three IDs you are using (and many more) into structure objects, and the fastest way to compare these is by computing a structure hashcode. There is no name/synonym matching involved - this purely works on structural connectivity

Here some interactive commands in the Python version, comparing Aspirin via its different DB IDs, and also directly computing the database IDs for structures from a different source:

cspy
pycactvs>e1=Ens('CID:2244')
pycactvs>e2=Ens('CHEMBL:25')
pycactvs>e3=Ens('DRUGBANK:DB00945')
pycactvs>e1.E_ISOTOPE_STEREO_HASH128
'8e1a0233-a328-045d-e61d-32db15c50d00'
pycactvs>e2.E_ISOTOPE_STEREO_HASH128
'8e1a0233-a328-045d-e61d-32db15c50d00'
pycactvs>e3.E_ISOTOPE_STEREO_HASH128
'8e1a0233-a328-045d-e61d-32db15c50d00'
pycactvs>e1.E_CHEMBL_ID
'CHEMBL:25'
pycactvs>e1.E_DRUGBANK_ID
'DB00945'
pycactvs>e2.E_CID
2244
pycactvs>e1.E_SMILES
'CC(=O)OC1=CC=CC=C1C(=O)O'

There is a chemistry-aware table object which helps you with the processing of table data files. I'd be surprised if this required more than 10 lines of script code.

ADD COMMENT
0
Entering edit mode

Thanks, Wolf! That resource looks very useful; I will check it out.

ADD REPLY
0
Entering edit mode
4.5 years ago
hsiaoyi0504 ▴ 70

Alternatively, use id mapping provided by unichem https://www.ebi.ac.uk/unichem/. More than 50 databases are processed to provide a full source mapping.

ADD COMMENT

Login before adding your answer.

Traffic: 1823 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6