Get Chemical Structures From Data In Comparative Toxicogenomics Database (Ctd)
4
1
Entering edit mode
12.2 years ago

The Comparative Toxicogenomics Database (CTD) has curated information on chemicals and its effects on human body. I'm very interested in the actual chemical structure of these chemicals, but unfortunately this is not provided by CTD.

Instead, for some of the chemicals they provide the CAS registry number, but the primary ID is the MeSH ID. For 10,10'-dimethyl-9,9'-biacridinium, for instance, they thus provide MESH:C033472 and CAS 2315-97-1, while for other chemicals they only give the MeSH number.

I need to retrieve the chemical structures (SMILES notation is fine) from CTD programmatically, preferably from the MeSH ID. How would you guys approach this?

structure • 4.7k views
ADD COMMENT
2
Entering edit mode
12.2 years ago

For the record,

PubChem REST 1.0 has been released, and this makes the search proposed by cdsouthan even more handy for bioinformaticians. Here I put a single query example.

To get the SID from a CTD id (for instance, C533207), do:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/sourceid/Comparative%20Toxicogenomics%20Database/C533207/cids/TXT/

This returns

57336509

Now, to get the SMILES string, access:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/57336509/property/CanonicalSMILES/TXT/

Which will give you

CN(C)CC1=CC(=CC=C1)NC(=C2C3=C(C=C(C=C3)C(=O)N(C)C)NC2=O)C4=CC=CC=C4
ADD COMMENT
1
Entering edit mode
12.2 years ago
cdsouthan ★ 1.9k

Performing this query in PubChem compound

http://www.ncbi.nlm.nih.gov/pccompound?term=%22Comparative%20Toxicogenomics%20Database%22[SourceName]&cmd=DetailsSearch

Will bring back 9989 CIDs that you can download as SMILES.

Note that the MeSH query

http://www.ncbi.nlm.nih.gov/pccompound?term=has_mesh[filt]&cmd=DetailsSearch

Brings back 82682 but the intersect is only 9147 i.e. less than 10% of CTD

ADD COMMENT
0
Entering edit mode

Great! So, if I want to map CTD ids to chemical structures, I first have to download the PubChem Substance files, fetch the corresponding PubChem Compound IDs and then get the structure. Is this right? With the link you provided (directly from PubChem Compound), none of the download options seems to carry information on the CTD ids. Thanks!!!

ADD REPLY
1
Entering edit mode
12.2 years ago
cdsouthan ★ 1.9k

Hmm not so easy then...

In the SIDs I can find a (repeated) field > <PUBCHEM_EXT_DATASOURCE_REGID> C533207 But this circles back to MeSH in CTD - not clear if this their primary ID

The SIDs have > <PUBCHEM_SUBS_AUTO_STRUCTURE> Deposited Substance chemical structure was generated via Synonym(s) "BIX 02189", "BIX02189", "BIX-02189" and Synonym Consistency to be CID 57336509

This seems rather confusing parsing from the MeSH tree

You can drop the SMILES out from the SIDs though

I was curious as to why they had more CIDs than MeSH but if you look at
http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=53790388 It looks like they are mapping the MeSH stems not the full chemical names. This might explain why the SIDs > CIDs

Sorry I cant offer much more just now. Perhaps if you can expand on where you want to get to there may be other ways

ADD COMMENT
1
Entering edit mode
12.2 years ago
wdiwdi ▴ 380

For a quick look-up of a structure from a MESH ID, or getting the MESH ID for a structure, go to

http://www.xemistry.com/edit/frame.html

and enter the prefixed MESH ID (here: MESH:67033472, see below) or the CAS number (as cited above. no prefix needed) in the entry field on the upper right and hit return. After a moment (can take a few seconds, Entrez is not always a speed daemon) the structure is displayed, and the SMILES is in the entry field in the upper right.

You can get the MESH ID from a structure input by other means via the menu items Data/database ids/drugs/mesh, an access URL for the MESH page via Data/database urls/drugs/mesh and directly display the MESH data page via Data/Web pages/drugs/mesh

Btw, the cited MESH number MESH:C033472 is definitely incorrect. MESH IDs are simple integers. Your compound is 67033472.

For a full dataset crossreference, NCBI elink is your friend.

ADD COMMENT

Login before adding your answer.

Traffic: 1728 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6