From the Uses-this series, I like the question in the title. What other bioinformatics software doesn't get enough use/recognition?
From the Uses-this series, I like the question in the title. What other bioinformatics software doesn't get enough use/recognition?
I think simple tools that allow one to perform very basic bioinformatics type tasks may be those least appreciated from a sustainability point of view. Say bioawk or seqtk very useful but not clear how one could support them or publish them.
Yet we need simple tasks all the time, how do I easily reverse complement a sequence, or translate it from the second reading frame, or find out how long a fasta file is without having to write a program in another programming language and then having to worry about how to install it which platform it will run on etc.
I've been using python statsmodels extensively lately. It has a very nice interface using R-like formulas via patsy. It has OLS, GLMs, mixed-effect models, GEEs, Robust linear regression, clustered errors, methods for calculating power/sample-size and much more.
It has good documentation and a very helpful mailing list.
Vmatch is definitely underrated. One problem is the manual doesn't really describe the Vmatch tools as they apply to common use cases. You'd really have to use a tool for one task before you would even consider using the rest of the toolset.
Don't get me started on overrated tools.
+1 for vmatch, this tool can be used for alignment, clustering, or any number of sequence comparison tasks. The main issue is that it is difficult to obtain and has a restrictive license. I recently tried to use vmatch and it produced a message (something like "vmatch is out of date, obtain new license") and exited. I really want to recommend this tool but I'm conflicted because I don't agree with these practices that make it hard to use.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Agreed. But, given the difficulty even supporting larger, widely used tools, I wonder what are the avenues to support small (but well-built) ones.
I believe that the root of the problem is that there is no mechanism in place to support small scale efforts - the tool may be too simple to write a multi year grant for it but why shouldn't one be able to request a month's salary to clean up a tool, add more documentation, write detailed use cases, add new few features etc. Seemingly all that is taken for granted.
What we need is a framework like Google Summer of Code where a candidate or even the developers themselves can apply without too much effort for say $5000 and improve on existing infrastructure. I think 100K-200K spent that way would radically improve bioinformatics.
Can we make this happen? Maybe I'm naive, but I feel like it would be feasible to raise $5,000 from personal and corporate contributions for a widely loved, under funded tool.
It is a matter of credibility and scale - raising $5000 for one tool would still require quite a bit of effort to organize and oversee - and one tool may not be visible enough. Doing to 10 to 100 would be credible and quite impactful - but that of course is now a different organizational framework. This Google Summer of Code idea could be something to be written up as a grant. Hmm. I'll think about it. No seriously.
How would that work? You write a grant to get money to give money to other people? Or you set up the infrastructure to make micro-granting possible?
I was thinking in terms of the traditional grant formats, in those too one gives most of the money to other people - although typically these people are directly associated with the organization that gets the funds. It could be set up to also serve as a mechanism of recruiting/training/outreach and bringing in more people into the field.
It might also work by setting up essentially a small consortia and getting a large grant from one (or more) relevant granting agencies. Individual users/labs can then apply to the consortia for smaller grants to support their particular tool as described. (5-10k range for instance). We have similar set ups here in Canada for supporting genomics in human disease studies. The larger group controls a pool of funds from one of our national research councils and acts as a research network/consortium. Investigators apply for small grants around 20k or so to do exome sequencing through the consortium for individual projects.
As you mentioned bioawk and seqtk, I am planning to publish them and tabtk in a 2-page 3-in-1 paper. For small tools, the developers should set a goal "develop-and-forget". The goal is very hard to achieve, but aiming for that at least helps to reduce the burden of future support.