https://www.biostars.org/t/software%20error/ shows you what users of Biostars tag as software error. It is not always easy, based on these case reports, to locate the pebkac exactly on either developer or user site.
I am going to expand this answer slightly, and step-by-step as I think 'software error' only scratches the surface of the problem of correctness and validity in computer science. These aspects are not primarily specific to bioinformatics but generally to computer science. First, consider the following quote (unfortunately I am unable to find the source, if any):
An error is the outcome of a creative process that becomes an error because it is not accepted.
Does this sound acceptable? Now consider this is the attitude of the engineers who constructed the bridge you have to take every morning, does it still sound acceptable? But this is pretty much how a lot of bioinformatics software is written, and the error message is just the tip of the iceberg, as if the bridge crumbles because you drive on it with an orange car but it has only been tested with blue and yellow cars crossing. A lot of bioinformatics software is written without following without formal specification, validation, testing, or adhering to established design principles, most of bioinformatics software are hacks. The obvious software error message tells you immediately, "sorry, this input data you are using, or parameter combination I have never seen and definitely cannot do anything relevant with". Also, a core dump is rarely a biologically sensible prediction.
Therefore, I would like to rephrase the question. Instead of looking at software errors only I would like to ask:
What can be done to improve correctness of bioinformatics software?
There are at least these two aspects related to the quality of an implementation, that we should pay respect to separately and that are too mixed in our previous discussion in this thread:
- validity (are we doing the right thing?)
- correctness (are we doing it right? this includes bugs, and the correctness of the algorithm)
The Application Domain of Bioinformatics is very complex
Bioinformatics deals with complex tasks, because it is a scientific field.
Sometimes with tasks that are known or believed to be intractable, such as predictions. Look at software like I-Tasser, it takes on the impossible challenge to predict the protein 3D-structure from the sequence, and works well for many proteins. Of course it is using existing 3-D structures (these have errors as well), and integrates a multitude of different algorithms and heuristics into its work-flow. Such complex models are hard to build and hard to evaluate. There is little independent data for testing. This again results in ill-specified or underspecified problems.
The example in brentp's post also nicely illustrates the specification issue for a rather 'simple problem'.
Proving the correctness of an algorithm or implementation is hard
While studying computer-science, one might get an introduction to algorithms and data. This course might also include a short introduction on proving the correctness of an implementation, but the overall impression is that proving the correctness of an implementation is too complicated for anything except the most simple of problems. So even for well specified problems, the solutions will involve a lot of trial and error. What is more, it cannot be decided by a computer program, if another program will even terminate, so there is no automatic way of checking correctness in sight. Automated testing is possibly underused.
Writing bioinformatics code is difficult because developers need to understand biology and programming
People who develop bioinformatics software have to understand both worlds, the application domain and programming well, they often have a focus on one of them. For example, they might be biologists with limited training in computer science or vice-versa, the teams that develop software might not equally represent both worlds either, or smaller programs are developed by single persons. There is always a trade-off in how good one can be in multiple domains. Developers might or might not be trained in modern concepts of software development and computer programming, such as test-driven development, pair programming, design patterns, revision control.
Sustainability, documentation and maintenance is not rewarded for scientific software
That refers to the aspect Istvan has mentioned already. If software is developed by a single PhD student or post-doc, there is not always much capacity, funding, or incentive to maintain the software or fix errors after the person moves on. Also, fixing bugs or improving existing software doesn't give publications, so there might be more incentive to develop a new software with -other- but not necessarily less flaws.
Dear harne.priyanka, welcome to Biostars, discussion is welcome in the Forum section, but please don't post invitations to take the discussion 'off-line'. Also, as a new user, you can't use this site as a booster for your linkedin profile. Thank you.
Dear Michael, thanks for letting me know. Noted!
See also, What Are The Most Common Stupid Mistakes In Bioinformatics? Otherwise it would be interesting if you could tell us a bit more about your research project. I also think that the sources of errors in software development and coding are quite different from errors in software application, but it might or might not be important.
Yes, that makes sense, errors in development and application would be different and I would like to know more about both.
More on my project: I am a content developer for conferences/workshops and currently I am researching to know what areas would the software side of bioinformatics need updates on. I am planning a hands-on workshop & conference in October this year. The reason for posting is to ensure the 'real' issues are addressed in the conference/workshop.
Please help with your perspective.
You should have this written as answer :)
Yes, that makes sense, errors in development and application would be different and I would like to know more about both.
More on my project: I am a content developer for conferences/workshops and currently I am researching to know what areas would the software side of bioinformatics need updates on. I am planning a hands-on workshop & conference in October this year. The reason for posting is to ensure the 'real' issues are addressed in the conference/workshop.
Please help with your perspective.
Not a direct answer but I'm sure this could help to understand the areas where mistakes/errors can occur A review of bioinformatic pipeline frameworks
Again, can you please disclose your affiliations?
What meeting are you organizing? Where is the location? Can you add your academic/commercial affiliations to your profile?