Out of curiosity I would like to classify the databases in the 2011 NAR Database Issue by accessibility, specifically which ones offer:
- A complete download of the data
- A web service (REST or SOAP) which allows automated queries from robots
- A bookmarkable website that allows links (i.e. the GET protocol) to individual records without necessarily going through a search form
For example,
Database Complete Download? Web Service? Bookmarkable? If yes, provide example:
COMBREX No No Yes http://combrex.bu.edu/DAI?command=SciBay&fun=proteinCluster&pClusterID=419683
But I can't do this alone.
So if you are interested please visit one of the databases and report your findings (or corrections) here. I'll compile the responses into a spreadsheet and report.
Here is a Google Docs Spreadsheet if you wish to edit it directly (it's wide open): https://spreadsheets.google.com/ccc?key=tZQGRMg24BHKgO4vUjYT5TA&hl=en#gid=0
Per Andra's suggestion, I have decided to put this up on Amazon mturk:
Jeremy wouldn't it be better to create a shared Google-Spreadsheet for this ?
You should check for an open data licence as well. More here: http://www.isitopendata.org/
I agree with Pierre. The first step IMHO would be to extract all the URLs from the abstracts (the abstract should have an URL according to the NAR DB issue guidelines) and dump them into a Google Spreadsheet.
Yep I just wanted to keep things in this forum.
I've just added a row for each article...
Jeremy, I've suggested to create an article for each DB in wikipedia: http://goo.gl/5jUoK . The infobox would contain the information about the web services.
Shocking how many broken links (404) and busted webapps (500) I've encountered already
In the 2011 issue?? Wow!
Thanks Jeremy ! I didn't understand what was exactly Amazon mturk until now :-)
Hey Jeremy - it is a fun question
Did you make the data available?
I was not able to make the MTurk thing work for some reason - i don't even know if that is still a thing.
The spreadsheet link is still active.