We recently made a bioinformatics tool that predicts interactomes from a certain kind of proteomics dataset. We'd like to make it easy to use so it reaches as many people as possible. Unfortunately, we've written it in Matlab, which we're now realizing is a significant barrier for some people. To avoid Matlab, we've been told we might want to translate it to R or give it a web interface.
Offering our data analysis as a web service sounds like the more attractive choice. However, I've never set something like that up, so I'm wondering what I should consider before deciding. Some things I'd like answered:
- How big a job is this, generally? Should I be thinking about weeks, months... years?
- What am I probably not thinking about?
- How should I approach security?
- Is our bioinformatics tool (details below) a suitable candidate for a web service?
The details about our bioinformatics tool:
- Written in Matlab.
- Expected total number of users is low (dozens of users would be good).
- Operates on datasets between 1 and 25 MB, i.e. that's what the user would have to upload.
- Runs in a couple of hours for a typical desktop machine and dataset.
My experience from trying to use web services is more often than not they are off-line - my impression is most people just want a paper published and do not care to maintain the server afterwards. It is ok to set up a web server, but if you want your tools to have a long life, publishing on GitHub or some equivalent is better.
Provide the code in a form useful to the intended target community. Matlab is not very much used in the bioinformatics community. I don't see why you would want users of your code to pay a third party for using it, unless, of course, you have financial interests in the company owning Matlab. Publishing Matlab code will force potential users to rewrite it into something they can use. Since most people won't do this, you're going to lose potential users. Web services require some long term commitment. I personally think it is a total disregard for the users when a service disappears two years after publication. Although you now think your code operates on small uploadable data sets, it won't be long before someone wants to run it on something larger or in parallel or in any other way you didn't think about. Also consider that people may not be allowed to upload their data to some random web server. So my free advice: rewrite your code in R, python, perl or even C/C++ (for speed) and make the source code available to the community.
From you (and reviewers) our choice of Matlab is a major problem. I thought it seemed reasonable, but it clearly clashes with the current bioinformatics community. Regarding the web service, it sounds like re-writing in R is the best option if we want anything lasting.
Take a look at the Shiny package for R. There's a lot of documentation on it. I've used it to write a few web apps.