I am currently looking to put together a comprehensive list of chemical data serialised in RDF. To be included in the list, the data has to be (a) open, (b) relating to chemical entities and (c ) available in an RDF serialisation. Without wanting to be too prescriptive, when I say "chemical" I mean small molecular entities or synthetic polymers - I am less interested in proteins or DNA etc.. My question is, which data providers the community knows that fit the criteria. So far, I have
OpenTox has an increasing amount of chemistry data. I have also RDF solubility data from the Open Notebook Science project, and a RDF version of the ChemPedia substances (on Science 3.0). And there is also Chem2Bio2RDF.
What you are looking for is also one of the core goals of the OpenPhacts project that is part of Innovative Medicine Initiative (IMI) that just started (no website yet, but I could bring you in contact with the developers). It is also important for the large knowledge collider project ([?]LarKC[?]) that is in fact one of the partners in OpenPhacts. One of the other partners is [?]RSC[?], the Royal Society of Chemistry, the intend of the OpenPhacts project is to also provide (part of) the [?]ChemSpider[?] content in RDF.
As far as I know there is no RDF serialisation for [?]HMDB[?], but I would be surprised if they would not be interested in providing one. So it might worth to contact David Wishart about that.
I don't know for LarKC, but for OpenPhacts the answer is no. License development is thought to be part of the sustainability plan development. After all this is a multi million investment in an open reasoning and knowledge environment for (primarily) drug development. You don't want to see that go away when the project finishes. I personally expect that the answer will be different for different parts of the project:
The toolbox is developed as an OpenSource project. There might be parts that will be developed under a dual licensing model, but since a lot of that extends on things that are already available under for instance Creative Commons and Apache licenses those are likely to persist.
The available content (the collected knowledge in the form of concepts and nanopublications/semantic triples) will also be available, but it may very well be dual licensed. (Meaning that if you want to make money using it, you will have to pay to keep it maintained).
The questions used to reason with (so not the content but the questions you want to answer using that content) will be closed and the aim is to offer a secure environment for that. After all this is for drug development, and you don't want to see drugs not being developed because they are no longer patentable.
Hi Egon, why not - should we maybe convert this into a community wiki thing and try and keep it up-to-date?
The Open in bioinformatics does not really seem to stretch all the way down to small molecules :(
Nico, do you like to extend this to half-Open data, like that the GNU or CC licenses?