Question

Smiles String Comparison Algorithms

5

Entering edit mode

13.9 years ago

Biogeek ▴ 170

What are the similarity algorithms normally used to compare slightly different, but related SMILES strings (e.g. Oc1ccc(cc1)\C=C\C(=O)c2ccc(O)cc2O vs O=C(/C=C/c1ccccc1)c2ccccc2).

chemoinformatics similarity • 9.4k views

ADD COMMENT • link updated 13.9 years ago by Gilleain ▴ 30 • written 13.9 years ago by Biogeek ▴ 170

score 3 · Answer 1 · 2011-06-29

3

Entering edit mode

13.9 years ago

brentp 24k

See this by Andrew Dalke.

In it, he references:

Lingos, Finite State Machines, and Fast Similarity Searching", J. A. Grant, J. A. Haigh, B. T. Pickup, A. Nicholls, and R. A. Sayle, J. Chem. Inf. Model 46(5) (2006) p1912-1918.

He also looks at using compression via zlib to look at compression.

ADD COMMENT • link 13.9 years ago by brentp 24k

score 3 · Answer 2 · 2011-06-30

3

Entering edit mode

13.8 years ago

Egon Willighagen 5.4k

Comparing SMILES directly only makes sense when you use canonical SMILES. More common is to process the SMILES in a chemical graph, and compare the actual graphs, so that it does not matter that you can have multiples SMILES for the same molecule. From then on, I suggest the fingerprint as representation for which you can calculate the similarity with the Tanimoto distance.

Example code using the CDK and R can be found in this vignette using the rcdk package.

ADD COMMENT • link 13.8 years ago by Egon Willighagen 5.4k

0

Entering edit mode

To expand...there can be many SMILES strings for the same chemical structure, so it doesn't make sense to compare the strings themselves.

ADD REPLY • link 13.8 years ago by Noel O'Boyle ▴ 40

score 1 · Answer 3 · 2011-06-30

1

Entering edit mode

13.8 years ago

Gilleain ▴ 30

You can use the SMSD to compare molecules as SMILES, which gives various similarity measures including Tanimoto.

ADD COMMENT • link 13.8 years ago by Gilleain ▴ 30