Entering edit mode
15 months ago
kieran
▴
10
I am curious to hear how everyone performs quality and assurance testing for updates to pipeline versions. It is reported that developers of bioinformatics pipelines spend 60% or more of their time testing their software. What strategies or tools do you currently use? How long does it take you to perform exhaustive QA testing? Is there anything that you wish existed that you could use for QA testing? What would make this less time-intensive whilst not sacrificing software quality?
Are you trying to sell us something?
No, just trying to start a discussion. It is documented that bioinformatics software testing is an important problem that the community is navigating. See a reference here: https://peerj.com/articles/cs-839/
I come with a bit of a software engineering background myself. There are a lot of computationally intensive teams that focus as much as possible on software development practices, but the emphasis on that is always low. It used to be: people want things done, they don't care about code quality as no one gets paid to maintain code. That's changed now, there is a lot more emphasis on reproducibility and good coding practices but the progress is extremely slow. CI is more prevalent now but software is still built in silos so the best I see right now is containerization and including minimal test files with the code base to ensure the software compiles and passes at least a few basic tests.
I write code that is accessed not too infrequently so I test every change, but even so regression bugs get introduced all the time because that's the nature of things. The field uses version control, virtual environments and avoids Excel a lot more than it used to, but we're still not in a software dev place where we have config files and markup documents ruling everything. Things are still manual/hard-coded to an extent.
Do you think more attention to software best practices from a software engineering standpoint would be beneficial from a cost to benefit standpoint? In some cases, we are dealing with correctly diagnosing potential terminal diseases and illnesses. There is an average of 15-50 bugs in every 1000 lines of code written in scientific code (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4629271/#ref-4).
That's surprisingly low.
Attention to software dev best practices will definitely help, so will data management best practices but it takes a LOT of effort to change one thing in one team, let alone make a persistent change across the board. We need bioinformaticians that are exposed to software dev skills, see the utility in them and have the wherewithal to keep the practices in mind while developing software. Software best practices are last on the priority list for a bioinformatician from a wet bench background - why do you think most newbies prefer python as opposed to say Java where you need to have a freaking static method in a class to print "Hello World"? Abstract software practices are hard for people with no programming experience to tackle.
People are more interested in solving the problems they face today than accounting for problems that could occur (no matter how challenging they might be) because of the way they're solving their short term problems. That sort of long term vision is just getting prevalent in the research community now.
Is the solution to this issue simply more software development, computer science, and computer engineering coursework in schooling?
I agree with you on the short term goals of current developers. A lot of times people who preach the long term implications forget that many people's funding and course completion relies on the results obtained during a relatively short period of research time. When I was in grad school, I needed to submit results to a grant a year later to be in good standing for getting the grant the following year although my research was going to need another two to be finished.
You nailed it on the timeline issue. I think coursework can be a good introduction to computer science/software development concepts, but practical software development experience is the only thing that can definitely help people see the importance of software dev concepts.
I think one way forward is to have independently funded software development teams in institutes. These teams would collaborate with PI-led scientific teams where bioinformaticians would then act as liaisons so we get the best of all worlds. The bioinformaticians would be exposed to software dev practices and the software team would be exposed to real world limitations that the scientific team operates under.
Yes, that would be a great idea. But sometimes you run into collaboration issues across disciplines at institutions in academia (egos....). What if there were some service that could be built and adopted that may help this? I know many developers use some form of git which was a marvelous advancement for version tracking in scientific programming fields. But what else could we do for more adoption of software best practices? How can we make the most immediate impact for researchers to be confident in the testing of the code that they have written whilst not sacrificing too much of their time?
I can't really say. I think the auto-testing on build (or whatever, I'm not sure what it's called) feature used by bioconda, bioconductor etc are excellent but are challenging for me, even with my software development background, to get into. Most of us do not write tests as we develop. A majority even don't test with an intent to break - as in, most people test software as "this is how it's supposed to be used" and if that works, they leave it at that.
But wait, bioconda and bioconductor automate testing already, so maybe it's up to these popular code repositories to implement some sort of testing. This puts the onus on just people who contribute to these repositories, and assumes these people know more about software than other bioinformaticians, which is quite true. I honestly think that while there is definitely room for improvement, the current unenforced ecosystem is good enough - well tested tools bubble up anyway.
I've also found that even well tested code does not account for most user problems. It's like this: The more I get into testing and anticipating possible problems, the less I'm actually around real users and their requirements. My users are much more used to the way they operate applications, so if I model my applications on those patterns, there usually isn't much room for error. I could then use my limited time to either plan new features or address possible but rarely-or-never encountered scenarios.
I'd say the best balance for developers is to perform basic input validation - file content sanity checks, domain integrity checks etc. Don't spend too much time on accounting for user behavior. Instead, address content level issues in your inputs and try to nail down why each error occurs and how to fix it. An error message saying "FASTQ files seem to be compressed. This tool is not equipped to work with compressed files. Run
gunzip fastq.gz
to uncompress them" is better than "Unable to read file" or even worse, letting the parser spit out the stack trace to the console (unless asked for with say, a debug option).</rant>
Sorry, this fell off of my radar a bit. I really appreciate all the thoughtful replies!
How do developers plan/program for common user use cases? Is it interacting with users and listening to feedback? Is it planning ahead for how developers see users using it? Oftentimes use cases are very different from one another both in terms of process and result.
It's both planning ahead and receiving feedback after - it's an iterative process.
bioinformaticians spending 60% of time their time testing the software?
that sounds a bit excessive to me. My guess would have been that median testing is probably 0 percent ... :-)
What does testing really mean in this context I wonder. Testing that the results are correct? or that that the pipeline completes?
Testing could be done in a number of ways. Running to completion is definitely one form of it, but if it is just trying to make sure it is a "happy path" to get results, which many in academia (myself included) have done in the past, this could miss out on desired outcomes like increased speed, accuracy, reliability, etc. Also, pipelines can miss reads, and if the goal is to improve the accuracy of reads through pipeline versions by developer updates, (just using a basic example) how do you go about evaluating that more reads are made correctly from version 1 to version 2 of your pipeline?