A running meme is that bioinformatics data is more open than bioinformatics (e.g. mentioned in this blog). It is my feeling too that this is the case, but looking at this spreadsheet with NAR-listed databases (resulting from this BioStar question), already shows many instances where the data cannot be downloaded. Moreover, the sheets give no clue on whether I can modify and redistribute the data, two core rights for Open Data. I, therefore, added two columns to allow annotation with these two aspects. Any non-commercial clause would make it non-open data too. General info can be found at Is It Open Data?
So, my question is basically how Open is bioinformatics data? What is the percentage of data that is in fact Open?
Daniel is right in principle. The practical comparison we can make is between available but non-open data and real open data, which indeed is what the table is about.
I am not sure whether about your evaluation of a non-commercial clause. Larger database are expensive structures that are often paid from community paid research projects. They need to be maintained after those projects end, which still is expensive. It makes sense that you pay for the maintenance if you make a profit from using the data. Also it is just not fair to take free open data, wrap it up in nice colored website or tool and sell it. As long as these clauses allow fair usage I think that is fine.
Chris, I understand your arguments (and have them many times), but isn't the whole idea of making data freely available that people in fact use it? How does a fancy website make maintenance more difficult or more expensive for you, if others help you share it? What defines a profit? Profit is one of the virtues of western civilization; what's wrong with that? How does it hurt the community of the data becomes more accessible because others start distributing it? I don't understand your point... (if you really just worry about attribution, given you mention 'fair', that's a whole other clause.)
I work mostly with genomics data (plant genomics to be precise), and I have never had any problems accessing, using, or reusing genomics data. I guess an exception would be when I have been given access to data that is not yet published, but that is quite understandable. In each case when the data was published, everything I had and more became accessible to myself and the general public.
Maybe it's just the nature of the beast: as an academic, you can't spend that much money and effort sequencing, assembling, and annotating a genome unless you plan to make it available as a public information resource.
@Egon I think I have to agree with you that (prote|metabol)omics data is much more scarce. Perhaps this scarcity is more of an issue than data licensing?
Yes, genomics data is OK... but how many proteomics and metabolomics databases are there around freely? Last is my field, really, and data are pretty scarce...
You cannot sell biomarkers? The thing is indeed that chemical structures can be patented... I we will see biomarkers patented, if that is not already happening... they patent genes too...
I just want to say that chemoinformatics data is more valuable than the bioinformatics one, yep, we still cannot deal with huge amount of biodata up to now, we still new real bioinformatics tools.
Genes and biomarkers patenting is big issue.
Gene patents had vague legal status in Europe and US: "It argues that isolated and altered DNA should be patentable, whereas DNA that is simply isolated should not be patentable."
Talking about biomarkers - yes, you can sell them, but in most of the cases this info is really odd. And in my opinion biomarker patent will be also have no legal status.
Seeing as you can never quantify how much information is closed, tombed or silo'd away - I'd say this is almost impossible to answer.
I guess this NAR database spreadsheet would be a reasonable approximation, not?
Daniel is right in principle. The practical comparison we can make is between available but non-open data and real open data, which indeed is what the table is about.
I am not sure whether about your evaluation of a non-commercial clause. Larger database are expensive structures that are often paid from community paid research projects. They need to be maintained after those projects end, which still is expensive. It makes sense that you pay for the maintenance if you make a profit from using the data. Also it is just not fair to take free open data, wrap it up in nice colored website or tool and sell it. As long as these clauses allow fair usage I think that is fine.
Chris, I understand your arguments (and have them many times), but isn't the whole idea of making data freely available that people in fact use it? How does a fancy website make maintenance more difficult or more expensive for you, if others help you share it? What defines a profit? Profit is one of the virtues of western civilization; what's wrong with that? How does it hurt the community of the data becomes more accessible because others start distributing it? I don't understand your point... (if you really just worry about attribution, given you mention 'fair', that's a whole other clause.)