I want to convert KEGG compound id into smiles format. How do I do this?
I want to convert KEGG compound id into smiles format. How do I do this?
You cannot convert them: KEGG identifiers do not have structural information. But you can use them to look up SMILES strings in databases that support KEGG identifiers, such as BridgeDb, WikiPedia, and of course, KEGG itself (which provides you MDL molfiles which you can covert to SMILES with the CDK).
The simplest method to do this is with a small script for the Cactvs chemoinformatics toolkit (see www.xemistry.com/academic for free academic versions), which has built-in support for this:
prop setparam E_SMILES unique 1 usearo 1
echo [ens new [ens create C00747] E_SMILES]
In the developer version (to be released soon), there is also a mechanism to force the look-up to go exclusively to the KEGG database proper:
prop setparam E_SMILES unique 1 usearo 1
echo [ens new [ens create KEGG:C00747] E_SMILES]
This has the advantage of accessing directly the native KCF files of the database, avoiding at least one more conversion round to/from Molfiles or other secondary formats by software of unknown reliability and is probably the most faithful representation of the actual KEGG content.
The current release of the toolkit already provides similar mechanisms for lookup of structures via Pubchem CIDs and SIDs, CAS numbers, and CHEMBL, CHEBI, ZINC, MESH or CHEMSPIDER IDs..
The usual caveats regarding "unique" SMILES apply. Don't use it if you can avoid it, and do not expect interoperability with any other "unique" SMILES. This software implements the original Daylight spec, with all its shortcomings, but the advantage of actually being a documented algorithm. Newer Daylight programs, and almost any other software claiming to compute "unique" SMILES have modified the original algorithm in undocumented and disparate ways.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.