I placed this question in StackOverflow and now I'm reposting it here since I just learned from this site. I am a cell biologist (postdoc) and amateur programming, looking to build an app for our students. As such, my knowledge of bioinformatics tools is limited.
Original question:
I'm building a database of chemical compounds. I need all the synonyms (IUPAC and common names) as well as safety data for each.
I'll be using the freely available data at PubChem (http://pubchem.ncbi.nlm.nih.gov/)
There's an easy way of querying each compound with simple HTTP gets. For example, to obtain glycerol data, the URL is:
http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=753
And the following URL would return an easy to parse format:
http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=753&disopt=DisplaySDF
but it will respond only very basic info, lacking safety data and only a few common names.
There is one public domain API for JAVA that seems a very complete, developed by a group at Scripps (citation). The code is here.
Unfortunately, this API is not very well documented and it's quite difficult to follow due to the complexity of the data involved. For what I gathered, pubchemdb is using the PubChem Power User Gateway (PUG) XML API
Has anyone used this API (or any other one available)? I would appreciate a short description or tutorial on how to start with it.
You can also post this question at this Q&A website: http://blueobelisk.shapado.com/ (PubChem developers tend to hang out there...)
@Egon thanks for the link. I had no idea about that site.