Hello BioStar Community,
I've encountered a curious situation in the PubChem database and am seeking insights from the community. I noticed that the same compound can be listed under different Compound IDs (CIDs), despite having identical chemical properties and identifiers, including the same IUPAC name, InChI Key, and chemical properties.
Example 1: The compound with the IUPAC name "stannous fluoride" (tin(II) fluoride) appears under two different CIDs:
CID 24550 with Canonical SMILES: F[Sn]F CID 10197804 with Canonical SMILES: [F-].[F-].[Sn+2] Both entries have the same molecular formula (F2Sn) and InChI Key, but differ in their structural representation.
Example 2: Another case involves the compound with the name Sulfaguanol, which appears under three different CIDs:
CID 65756 - https://pubchem.ncbi.nlm.nih.gov/compound/65756 CID 9571041 - https://pubchem.ncbi.nlm.nih.gov/compound/9571041 CID 5464101 - https://pubchem.ncbi.nlm.nih.gov/compound/5464101 In this case, all three entries share the same molecular formula, IUPAC name, InChI Key, and chemical properties.
These examples raise a question: why does PubChem list the same chemical entity under different CIDs, given that their core properties are identical? Is this differentiation based on structural representation (ionic vs covalent for the first example), or is there another reason for the distinction? How does PubChem generally handle such cases where the differences are mainly in representation rather than in chemical composition?
Thank you for any insights or explanations you can provide on this matter.
ChatGPT had the following two things regarding different CID's. See if they apply.
Thank you for the answer! However, my background is not in chemistry, so I am unable to fully verify these points on my own.
I am working on a project where the goal is to find PubChem CIDs for as many ChEBI IDs as possible. I have identified UniChem, ChEBI, and PubChem as potential data sources but In the data provided by UniChem, I have noticed that thousands of CHEBI IDs are mapped to more than one CID. I want to understand why this happens because I'm looking for the most suitable data source.
Any advice on how to best match CHEBI IDs with the correct PubChem CIDs would be really helpful.
If you have a ChatGPT account (you can get a free one) then ask it to "match CHEBI IDs with the correct PubChem CIDs". That should get you moving until someone answers.