Forum:What bioinformatics tools/software do not get enough recognition?
4
7
Entering edit mode
10.1 years ago
brentp 24k

From the Uses-this series, I like the question in the title. What other bioinformatics software doesn't get enough use/recognition?

software • 3.8k views
ADD COMMENT
8
Entering edit mode
10.1 years ago

I think simple tools that allow one to perform very basic bioinformatics type tasks may be those least appreciated from a sustainability point of view. Say bioawk or seqtk very useful but not clear how one could support them or publish them.

Yet we need simple tasks all the time, how do I easily reverse complement a sequence, or translate it from the second reading frame, or find out how long a fasta file is without having to write a program in another programming language and then having to worry about how to install it which platform it will run on etc.

ADD COMMENT
2
Entering edit mode

Agreed. But, given the difficulty even supporting larger, widely used tools, I wonder what are the avenues to support small (but well-built) ones.

ADD REPLY
4
Entering edit mode

I believe that the root of the problem is that there is no mechanism in place to support small scale efforts - the tool may be too simple to write a multi year grant for it but why shouldn't one be able to request a month's salary to clean up a tool, add more documentation, write detailed use cases, add new few features etc. Seemingly all that is taken for granted.

What we need is a framework like Google Summer of Code where a candidate or even the developers themselves can apply without too much effort for say $5000 and improve on existing infrastructure. I think 100K-200K spent that way would radically improve bioinformatics.

ADD REPLY
0
Entering edit mode

Can we make this happen? Maybe I'm naive, but I feel like it would be feasible to raise $5,000 from personal and corporate contributions for a widely loved, under funded tool.

ADD REPLY
0
Entering edit mode

It is a matter of credibility and scale - raising $5000 for one tool would still require quite a bit of effort to organize and oversee - and one tool may not be visible enough. Doing to 10 to 100 would be credible and quite impactful - but that of course is now a different organizational framework. This Google Summer of Code idea could be something to be written up as a grant. Hmm. I'll think about it. No seriously.

ADD REPLY
0
Entering edit mode

How would that work? You write a grant to get money to give money to other people? Or you set up the infrastructure to make micro-granting possible?

ADD REPLY
0
Entering edit mode

I was thinking in terms of the traditional grant formats, in those too one gives most of the money to other people - although typically these people are directly associated with the organization that gets the funds. It could be set up to also serve as a mechanism of recruiting/training/outreach and bringing in more people into the field.

ADD REPLY
0
Entering edit mode

It might also work by setting up essentially a small consortia and getting a large grant from one (or more) relevant granting agencies. Individual users/labs can then apply to the consortia for smaller grants to support their particular tool as described. (5-10k range for instance). We have similar set ups here in Canada for supporting genomics in human disease studies. The larger group controls a pool of funds from one of our national research councils and acts as a research network/consortium. Investigators apply for small grants around 20k or so to do exome sequencing through the consortium for individual projects.

ADD REPLY
1
Entering edit mode

As you mentioned bioawk and seqtk, I am planning to publish them and tabtk in a 2-page 3-in-1 paper. For small tools, the developers should set a goal "develop-and-forget". The goal is very hard to achieve, but aiming for that at least helps to reduce the burden of future support.

ADD REPLY
3
Entering edit mode
10.1 years ago
brentp 24k

I've been using python statsmodels extensively lately. It has a very nice interface using R-like formulas via patsy. It has OLS, GLMs, mixed-effect models, GEEs, Robust linear regression, clustered errors, methods for calculating power/sample-size and much more.

It has good documentation and a very helpful mailing list.

ADD COMMENT
0
Entering edit mode

Nice. I hadn't heard of this package before. Looks interesting. I think it will definitely get added in to my repertoire. Much as I love R I like to stay in Python as much as possible.

ADD REPLY
3
Entering edit mode
10.1 years ago

Vmatch is definitely underrated. One problem is the manual doesn't really describe the Vmatch tools as they apply to common use cases. You'd really have to use a tool for one task before you would even consider using the rest of the toolset.

Don't get me started on overrated tools.

ADD COMMENT
1
Entering edit mode

+1 for vmatch, this tool can be used for alignment, clustering, or any number of sequence comparison tasks. The main issue is that it is difficult to obtain and has a restrictive license. I recently tried to use vmatch and it produced a message (something like "vmatch is out of date, obtain new license") and exited. I really want to recommend this tool but I'm conflicted because I don't agree with these practices that make it hard to use.

ADD REPLY
0
Entering edit mode

I agree. Although in my experience the developers have been very quick in answering emails about license (i.e. < 24h).

ADD REPLY
2
Entering edit mode
10.1 years ago

Lately I've been playing with SymPy, a python library for symbolic mathematics. It's still in development but it looks very nicely done and the mailing list is very responsive. I think it would be great if the bioinformatics community took it up.

ADD COMMENT
1
Entering edit mode

it's a cool project. what would be some uses in bioinformatics?

ADD REPLY
1
Entering edit mode

Mathematical modeling? (For some loose definition of bioinformatics/computational biology)

ADD REPLY

Login before adding your answer.

Traffic: 1758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6