How to obtain DrugBank Drug Name and SMILES Data from DrugBank ID?
3
0
Entering edit mode
7.2 years ago

I have downloaded the complete set of target data from the DrugBank site (available here: https://www.drugbank.ca/releases/latest#protein-identifiers). Each row contains data for one target; the final column of each row lists N DrugBank IDs for the N drugs associated with that target.

For each DrugBank ID, I need to locate the associated Name and SMILES data. I have searched BioStars and elsewhere online, but so far, haven't located a way to do this in an automated way. One idea is to mine the 600 MB DrugBank XML database file, so that I could extract the drug Name and SMILES data associated with each DrugBank drug ID value. However, if there is a simpler way to obtain the Name and SMILES data without need to deal with that huge file, any recommendations will be much appreciated. Thanks in advance for any advice you can provide.

DrugBank R • 10k views
ADD COMMENT
3
Entering edit mode
7.2 years ago
cannin ▴ 350

Try the PUG PubChem API: https://pubchem.ncbi.nlm.nih.gov/pug_rest/PUG_REST.html

Using Topotecan (DB01030) as an example below:

Get data by DrugBank ID: https://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/sourceid/drugbank/DB01030/XML

This XML has a PubChem Compound ID, look for: <PC-CompoundType_id_cid>60700</PC-CompoundType_id_cid>

Then get names (the name PubChem lists is first): https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/60700/synonyms/XML

and SMILES (Canonical and Isomeric): https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/60700/XML

ADD COMMENT
0
Entering edit mode

Thank you, cannin!! I will try your strategy.

ADD REPLY
1
Entering edit mode
7.2 years ago
Björn ▴ 670

Mining the XML is one way, however probably the most complicated one. Instead you can download the drugbank SDF or SMILES file and map your IDs against these files to filter them for example.

ADD COMMENT
0
Entering edit mode

Thanks for your suggestion, Bjorn; I will pursue your suggested method.

ADD REPLY
0
Entering edit mode

Could you clarify where the SMILES file and other similar DrugBank files are available? I have reviewed the DrugBank site's contents, and see the full database available for download, as well as this page: https://www.drugbank.ca/releases/latest#protein-identifiers that allows download of the target, enzyme, carrier and transporter datasets, but I haven't seen that the drug names or SMILES data is included with these files...

ADD REPLY
0
Entering edit mode
5.4 years ago
mohfcis ▴ 20

You can use dbparser package https://github.com/Dainanahan/dbparser, it is designed to parse DrugBank database and return R dataframes including the SMILES

ADD COMMENT

Login before adding your answer.

Traffic: 2395 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6