PDB and MMDB differences in ligand structure
1
0
Entering edit mode
9.6 years ago
cdsouthan ★ 1.9k

During an audit of our PDB ligand links in http://www.guidetopharmacology.org/ we have been looking at intersects between PDBe ligands (via UniChem) and PubChem CIDs for what should be the same structures in NCBI MMDB. While the comparison is preliminary, from our total of ~ 900 (curated real lead-like structures) we see peculiar discordance of well over 100 in both directions (i.e. PDBe with no exact match in MMDB and visa versa). We have seen this issue before for individual cases but I wonder if anyone has done a systematic comparison (e.g. via InChIKeys)? The numbers dont add up for starters with 19713 in PDBe and 27973 for MMDB.

Some boil down to just two hydrogens but still a missmatch (e.g. AWJ)

http://www.guidetopharmacology.org/GRAC/LigandDisplayForward?ligandId=8137

https://pubchem.ncbi.nlm.nih.gov/compound/72710568

http://www.ebi.ac.uk/pdbe-srv/pdbechem/chemicalCompound/show/AWJ

Others are more serious such as 35Q

http://www.guidetopharmacology.org/GRAC/LigandDisplayForward?ligandId=8231

https://pubchem.ncbi.nlm.nih.gov/compound/86290235

http://www.ebi.ac.uk/pdbe-srv/pdbechem/chemicalCompound/show/35Q

MMDB PDB • 3.9k views
ADD COMMENT
0
Entering edit mode

Just received a tweet reply (appreciated) PDB ligs include unobserved atoms and idealise geometry, looks like (e.g. 35Q) pubchem extracts from coordinates?

This is the (we know what we put) "in" verses the (lets see what we can density-fit) "out" problem

Lets see what else gets pitched in (check twitter if interested - the exchange seems to have moved there! )

ADD REPLY
0
Entering edit mode
ADD REPLY
1
Entering edit mode
9.6 years ago
conroy ▴ 20

To expand, when PDB annotators make a PDB chemical dictionary, it as per the author's definition (as far as it is given) of what the molecule _should_ be. What is built into the coordinates may deviate significantly from that.

Due to disorder, there might be bits of the ligand which are not observed in the crystal structure (eg 35Q), but the PDB definition should include the unobserved bit.

In some cases the geometry in the coordinates is improbable. I can think of a ligand where (in a sugar ring) the bonds C-O-C were 1.2 and 1.7Å. The dictionary though would have fixed these to ideal values. Deviation of the coordinates from ideal is listed in the validation reports distributed with each PDB entry.

Covalently bound molecules may be another source of difference, The PDB definition may include a leaving group which has left, I don't know how pubchem handles such cases.

If a molecule definition is made entirely from the XYZ coordinates in a PDB file (with a modal resolution of about 2.3Å, and rarely with hydrogens) it will be prone to error, though I'm not suggesting all PDB definitions are correct by any means; TP7 is currently built incorrectly and is about to be fixed.

ADD COMMENT
0
Entering edit mode

Thanks, very useful (BTW any chance of those 19K InChIKeys?) Next up should be someone from the MMDB team I hope.

ADD REPLY
0
Entering edit mode

What is TP7?

ADD REPLY
0
Entering edit mode

Coenzyme B, it has been built in error with a OH rather than carbonyl, but now PDB annotatotrs have noticed, it is being fixed. and will be updated at http://www.ebi.ac.uk/pdbe-srv/pdbechem/chemicalCompound/show/TP7 next week.

https://en.wikipedia.org/wiki/Coenzyme_B

ADD REPLY

Login before adding your answer.

Traffic: 3086 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6