Question

Your Daily Bioinformatics Pipeline/Library

4

Entering edit mode

13.9 years ago

Fabian Bull ★ 1.3k

In my work at an institute for cropplants some jobs have to be done on a regular basis (monthly, weekly or even daily). In the beginning I had a Perl-Script for every task and everything was fine. But as time went on I found myself using the same perl snippets over and over again and started to think about a library.

Now I am developing a scala bioinformatics library for my daily work (parsing BLAST, filtering files, alignments, statistical models ...).

I wonder what solution you have for fighting the daily strugle of data handling and processing. Do you have a library by yourself (If yes which lanuage do you use and why?) or do you use public libraries like BioJava? Or do you have a fixed set of little programms that can do everything you need (unix tools, ...)?

pipeline library • 3.8k views

ADD COMMENT • link updated 8.0 years ago by Biostar 20 • written 13.9 years ago by Fabian Bull ★ 1.3k

0

Entering edit mode

+1 for scala ;-)

ADD REPLY • link 13.9 years ago by Ido Tamir 5.2k

score 3 · Answer 1 · 2011-10-18

I tend to have a code in a couple stages.

One-off scripts: While I've learned that even what you think is a one-off script is not really one-off (collaborators will inevitably come back with more things to try, params to change, etc), these sorts of scripts are robust enough to tweak, but are generally too specialized to factor out into a library.
"Personal" libraries: these are in development and typically have unstable APIs. This is probably where the most interesting stuff lives, but it tends to be messy while I work things out. For example, code here might be an evolved form of a one-off script from the above category, where the lab is now doing similar experiments that could benefit from a more structured code base.
Public libraries: If there are parts of those "personal" libs that I think would be of general use, they get cleaned up, tested, and put in public github repos.

Pretty much everything is in Python, since that language is used across many scientific disciplines resulting in lots of high-quality libraries (scikits.learn as just one example). It can also be sped up substantially with a sprinkling of Cython. If a library is public (i.e., easy_install-able), it's fair game for any of those stages of development.

The interesting question is: at what point do you decide what you have is useful to others, and put the time into a building a public library? Still struggling with this one . . . but I would say more eyes on your code will only make it better.

score 2 · Answer 2 · 2011-10-18

2

Entering edit mode

13.9 years ago

Martin A Hansen 3.0k

I use Biopieces (www.biopieces.org). When I need something new, I write a new Biopiece for that task.

ADD COMMENT • link 13.9 years ago by Martin A Hansen 3.0k

1

Entering edit mode

Wow. That looks great.

ADD REPLY • link 13.9 years ago by Fabian Bull ★ 1.3k

score 1 · Answer 3 · 2011-10-18

I've once faced those issues, too. I started my own function library in Python where I collect new methods I've written for my daily work in structural bioinformatics, mainly parsers, wrappers around programs, machine learning related stuff. But I also make use of external libraries such as BioPython, Biskit, mmLib, MMTK. The latter three mainly provide methods for structural bioinformatics. In case colleagues find something useful, I usually make my code available through little programs that call those libraries and offer a convenient commandline interface.