Can anyone recommend some methods for parsing data from PubChem Compound records. I can get a complete dump of the database from the PubChem FTP.
The data is available in ASN, SDF, and XML formats.
For demonstration purposes, imagine that I want to reproduce a subset of the information displayed for a particular drug on the website. For example the record here: Sunitinib.
More specifically, imagine that for this CID (5329102), I want to determine the drug name, the names listed under 'also known as', and the 'Depositor-Supplied Synonyms'.
I ultimately want to be able to perform these kind of queries for every record in PubChem, not just that one.
It sounds like the PubChem Power User Gateway (PUG) might be helpful? If so, can someone provide a description of how I would get going on the example problem I outlined?
I'm having great difficulty relating the CID (5329102) to any file in the FTP site. That would be my starting point.
Hi Malachi, you asked this question three years ago but updated it a few weeks ago. Can you tell us how you solved the problem in the end? Apparently you're still working on it. Thanks!
Yes, I was also having trouble immediately relating records viewed on the web with info in the FTP site...
How do you deal with this problem eventually? I want to get the Therapeutic Uses and Pharmacology and Biochemistry for some CIDs from pubchem. Thank you.