I have often heard people in bioinformatics complaining about that it is very difficult to write tests for programs and scripts, and that to a certain point it is not useful to do that, as it is impossible to write tests for every possible wrong behaviour of your scripts. If you ask any bioinformatics student, it is very likely that he won't even know what a test is.
However, if you think about it, biologists have had to face the problem of designing tests for a long time before bioinformatics was invented, with an higher level of difficulty, and they came out with some sort of solution. My former Molecular Biology prof taught me that the most difficult part of an experiment is not just to formulate an hypothesis, but to design the right tests and controls to prove that the method is correct.
If you think about it, it is probably impossible to achieve real reproducibility for a wet-lab experiment. Even the same cell line, after some duplications in a lab, can accumulate so many mutations to differ significantly from the same strain in another lab. And there are a lot of variables that make reproducibility really hard to achieve in a wet lab: the climate of the lab, the experience of the researcher, the reagent used... and, even when you have demonstrated a result in a cell line, you can't deduce that the same happens in another line or in vivo.
However, even if they can't eliminate the problem of reproducibility, scientists have been able to find a compromise solution. They have a range of well documented best practices to follow, like putting a positive and negative control in any western blot, calibrating their machines first, using a blind control framework, etc... these practices enable scientists to make experiment that can reasonably be assumed as being reproducible. Moreover, they have an international organization in charge of that.
In bioinformatics, there is not such thing as best practices. If two students write two parsers for the fasta format, there is no way to tell whether their results will be the same, because there is no standard practice they have to follow. It would be better if, for example, they had to follow some guidelines, or if they are required to write parser that can pass a standard set of use-cases of fasta files without giving errors.
So, what is your approach toward testing? Which is your strategy when designing which tests you will have to write for your analysis, and which controls you will be using? Do you have any recommendation that you would give to an inexpert newbie bioinformatician?
I always think I should use
assert
statements I just never remember ... although if I assume a sorted list then I'll make sure i have a unit-test which fails if the function is passed an unsorted list.