During an audit of our PDB ligand links in http://www.guidetopharmacology.org/ we have been looking at intersects between PDBe ligands (via UniChem) and PubChem CIDs for what should be the same structures in NCBI MMDB. While the comparison is preliminary, from our total of ~ 900 (curated real lead-like structures) we see peculiar discordance of well over 100 in both directions (i.e. PDBe with no exact match in MMDB and visa versa). We have seen this issue before for individual cases but I wonder if anyone has done a systematic comparison (e.g. via InChIKeys)? The numbers dont add up for starters with 19713 in PDBe and 27973 for MMDB.
Some boil down to just two hydrogens but still a missmatch (e.g. AWJ)
http://www.guidetopharmacology.org/GRAC/LigandDisplayForward?ligandId=8137
https://pubchem.ncbi.nlm.nih.gov/compound/72710568
http://www.ebi.ac.uk/pdbe-srv/pdbechem/chemicalCompound/show/AWJ
Others are more serious such as 35Q
http://www.guidetopharmacology.org/GRAC/LigandDisplayForward?ligandId=8231
https://pubchem.ncbi.nlm.nih.gov/compound/86290235
http://www.ebi.ac.uk/pdbe-srv/pdbechem/chemicalCompound/show/35Q
Just received a tweet reply (appreciated) PDB ligs include unobserved atoms and idealise geometry, looks like (e.g. 35Q) pubchem extracts from coordinates?
This is the (we know what we put) "in" verses the (lets see what we can density-fit) "out" problem
Lets see what else gets pitched in (check twitter if interested - the exchange seems to have moved there! )
I have expanded the topic at http://cdsouthan.blogspot.se/2015/05/will-real-pdb-ligands-please-stand-up.html