The recent popularity of the article discussed here has led me to debate what many of my pure computer science friends see as big problems in bioinformatics—poor code quality, lack of testing and documentation, and unreliability. I said that things generally seem to be getting better over time, but they said they keep hearing about vague "improvement" in bioinformatics with no concrete examples.
As just an undergrad (in CS and Molecular Biology) with a short history in the field, I couldn't come up with much to say. So I turn to you.
Can you give any specific examples of how bioinformatics has improved over time? From specific tools and file formats to general practices?
Thanks for your time!
I am wondering if it really makes sense for bioinformatics to universally include the overhead associated with professional quality code. In some cases where the software is being used by hundreds, if not thousands of people, I guess the answer is obviously yes. However, I think a lot of ideas in science never go anywhere and it seems to me that it makes sense to write most code in whatever way makes sense for your small research group (as small proofs of concept) and then refactor the code later if the idea proves viable. There is an opportunity cost between exploring new ideas and exploiting those ideas to their fullest through professional quality implementations.
For myself, I would find it hard to do documentation, commenting, version control, testing etc for more than maybe one or two projects.
Anyway, I think measuring bioinformatics in terms of the quality of code is maybe not a great measure. It misses the science which is the accumulated knowledge about what algorithms and approaches apply to various biological problems. It's a little bit like measuring mathematical physics in terms of the quality of mathematics, while leaving out the advances in physics.
I think in addition to talking about the quality of the informatics, we should ask, what biology have we learned from bioinformatics that we would not have learned otherwise? How much (if at all) has bioinformatics increased the rate at which our knowledge about biology grows?
Note that Kate raised the question in the context of Fred Ross' rants on bioinformatics and the related HackerNews discussions. I guess one of the purposes serves to persuade more CS students and dedicated programmers to work on bioinformatics. For many of them, their interest is not in biology, at least not initially; their interest in bioinfo is to apply their skill sets on practical problems. If the bioinfo community think algorithms, code quality etc are not important, what is the point for a CS student or a professional programmer to join the field? Wouldn't a biologist with basic training in programming be able to solve most of the problems? From Fred's posts and CV, his primary interest seems to be programming. My feeling is his frustration mainly comes from the failure to pursue his interest. He is not alone: although most of answers/comments here rightfully indicate his post is severely biased, the post still gets 11 upvotes (personally, I only upvote a post for my support of its content); his opinion in the programming aspect is also echoed by serveral in HackerNews.
Maybe we can say "haters gonna hate", which is probably true, but Fred's post and more importantly the responses still make me think: how much room is there for CS students and dedicated programmers? Do we want to get more of them involved? On the interaction with computer scientists and professional programmers, is our bioinfo community moving in the right direction? If not perfect, can we do something on our side (without asking to reform the funding system, some aruged as the root of all evil) to improve the current situation? I haven't found answers myself.
PS: When I finish the comment, I realize it too long. But it is not answering Kate's question, so I will leave it here. I am sorry for the long mess without a clear conclusion.
But isn't maintaining proper code analogous to sending reagents/organisms for a wet lab biologist? Is there really that much more overhead in support/refactoring code compared to keeping hundreds of stocks of flies/zebrafish/hybridomas?
I realized when I finished writing my response that I don't have an answer to the question of: who is going to do the refactoring? I worry that fixing or improving old code has no obvious reward in the ecosystem of "publish or perish".