We've developed some nifty data analysis software for a specific kind of proteomics experiment. We're trying to publish it and get it out there. Unfortunately we wrote it in Matlab. How screwed are we?
So far we have unanimous feedback that Matlab was the wrong choice, i.e. all reviewers complained about it and feedback on this biostars question suggested the same. However I had a naive understanding that Matlab was one of many bioinformatics tools, and our sample size is pretty small (half a dozen), so I'm wondering are these views representative of the bioinformatics community? Is there indeed a consensus that bioinformatics software should be written in something other than Matlab?
I seem to remember MATLAB having the ability to convert/export in to compilable C code (not sure what flavour)? Perhaps you could just export, then you can release the C code as open source. But yeah, a MATLAB academic subscription is several thousand £/$ so you'll struggle with usership if left as-is.
That said, as others have mentioned, is very sub-field dependant - many of my colleagues who work in systems biology (basically mathematical models of biological systems) use MATLAB extensively. Even then however, it often loses out to Mathematica which I believe is free at least to some degree.
This is probably difficult to answer, but can you say how much it would limit its use (a little, a lot)? We're weighing publishing the Matlab code with rewriting in something like R. The latter is a big job, and it sounds like way to go, but I want to be sure.
Depending on the exactly subpopulation of bioinformatics/biology you're targeting it could well be a lot. At least for those of us involved in NGS, matlab is simply never used for anything. I used it a bit to try out some signal deconvolution methods and other than that the only time I see it used is in metabolomics (there seems to be 0 consensus about what to use in that relatively small community).
If the proteomics community you're targeting is already using matlab on occasion then you should be fine. Otherwise, you're going to have a hard time selling them on the idea.
Thanks. I believe Matlab is not standard in proteomics. One option is making a standalone executable (e.g. using Matlab deploytool) meaning no Matlab license is required to run our software, although I'm not sure whether people are any more likely to use it (it's still not R/python).
It might be a pain first, but I think rewriting in R will pay off in the long run. (As far as I know there is also Octave, a rarely used but free alternative to Matlab). I'm pretty sure that R has a bigger user-base in bioinformatics than Matlab, making adoption and incorporation of your software easier.
There are some nuances to consider: which (sub)field of bioinformatics, which journal you submitted to, how widely adopted is your language of choice, what is your goal and who is your target audience.
For example, I have the impression (I may be wrong here) that neuroinformatics has a good number of Matlab-based tools. Open access journal are more likely to demand some kind of open source license for the software and its dependencies. Nowadays, Python and R are the current languages of choice, even other open source languages may receive criticism for not being "widely adopted". Finally, if you want your software to be widely adopted by the academic community, using an open source language would be the most sensible choice.
Anyway, have you tried running your script with Octave? By no means Octave is a common program for bioinformatics, but it is open source and may answer the reviewers concerns.
Matlab is a standard in many CS fields, in particular in engineering and some areas that deal with images. One of the reasons is that these university departments get comparatively cheap licenses. One aspect to consider is interoperability. This is important if your tool is going to be run as part of a larger analysis pipeline.
Both Matlab and R have similarities but Mallab has its origin in numerical analysis and engineering while R has its origins in statistics. R has thus many functions that are not found elsewhere. Matlab covers some of the same functionality as R with toolboxes some of which you also have to buy thus increasing the cost for a lab not covered by some institutional license agreement.
One of the big differences is in graphics capabilities. Matlab outputs crappy figures whereas you can make paper-quality figures in R (at least I've never seen Matlab figures done up to the standard of biological journals, CS journals are pretty happy with crappy figures and plots compared to biological journals). Matlab used to have an advantage with its IDE but that's now gone with the advent of RStudio.
So while everythig is possible, if you're targeting a particular audience, you'll get better uptake by using tools that are widely available to that audience.
In short, Matlab (or Octave for that matter) is not a common tool in the area of sequence processing/analysis but R and python and perl to some extent are.
EDIT: One thing I forgot that's quite important for many bioinformaticians is that R has packages providing easy access to data in biological databases. Matlab has a barebone ODBC driver interface in a separate toolbox that you have to buy.
What about if we converted our code to standalone executable? This would get around the expense issue (no Matlab license is needed to run the executable) although not the "uncommon tool" issue.
That would be the perfect alternative. You can also make the original code available on the side for those who are lucky enough to have access to Matlab licenses. You would not have to re-implement/re-code the software again.
A reviewer who rejects your paper because of the choice of the programming language does not understand the role of peer review. Having said that, there is a lot of benefit for both you and your users in providing your code in a public repository with automated builds and unit tests.
Make it as easy as possible for users to download, install and test run the software and support standard file formats. Benchmark your tool against competitors and make your tool better. At the end of the day, this is what matters most. Make sure the niche your are filling will be relevant in the next years and is worth spending your time (most often, if there are no good tools, it's for a reason).
Of course matlab is not open source and that is a drawback, I know many publications that are based on matlab experiments, so I feel like this should not be the reason to stop you from publishing your results. Also, you developed a workflow, a methode, I assume and matlab is 'just' the implementation. So if you publish it, it doesn't stop you (or anyone else) from re-implementing it in different languages. I worked once with an algorithm that ended up having a matlab, C, python and fortran implementation. If your method is good, the same thing could happen to you or you could try to take it actively in that direction by e.g. starting to implement an e.g. R solution. So, don't let this stop you!
The drawback is not so much that it's not open source as the fact that it could be quite expensive. I've also seen a lot of matlab code in papers but, although I am not saying this doesn't exist, I don't remember seeing matlab code in bioinformatics papers I read. In addition, a lot of papers that have matlab code just provide it as some sort of proof-of-concept that requires quite some more work to turn into a real-world application.
Of course people can write bioinformatics tools in matlab. The questions are: would users need this tool so badly that they would either buy matlab to run it or rewrite it into another language ? In the latter case, who gets most of the credit if the corresponding R package ends up being the one everybody uses ? Sure, the original paper and matlab code will be cited in the R package but who would be remembered for providing the tool ?
I completely agree with what you are saying. I like matlab in a way but of course, it is not open source and if your institute does not provide a license, that is a problem.
The questions are: would users need this tool so badly that they would either buy matlab to run it or rewrite it into another language ? In the latter case, who gets most of the credit if the corresponding R package ends up being the one everybody uses ?
Yes, also agreed. That is why I said he might push towards that direction and work on it himself and provide also an R package etc in the near future. However, if this method is close to be published and the only thing that holds it back is the fact that it's matlab at the moment, I think it shouldn't.
I agree that Matlab is not a good choice to develop bioinformatics tools.
Most used programming languages in this area are R, Python, Perl, Java, C/C++.
And I believe R + Python will become the most commonly used, while C/C++ will still be the first choice for writing high performance software. But I think Perl and Java will be adopted by fewer developer since Perl is not friendly in software engineering, and Java is not as flexible as Python like languages.
Can you give more detail about translating from Matlab to R being easy? Do you mean that the two languages are similar, or is there an automated tool, or ...?
I seem to remember MATLAB having the ability to convert/export in to compilable C code (not sure what flavour)? Perhaps you could just export, then you can release the C code as open source. But yeah, a MATLAB academic subscription is several thousand £/$ so you'll struggle with usership if left as-is.
That said, as others have mentioned, is very sub-field dependant - many of my colleagues who work in systems biology (basically mathematical models of biological systems) use MATLAB extensively. Even then however, it often loses out to Mathematica which I believe is free at least to some degree.
...and you could then write an R module using that C code. You are a bad dude, that is a great idea lol, I like it.