I'm a big fan of python and Paver. It facilitates the creation of REPEATABLE, DEFINABLE and RE-ENTRANT scripts. You can define each "model" of analysis as a separate function: normalization, analysis, figure-generation, etc. You can also construct task-dependency-trees: analysis depends on normalization, second-step of analysis depends on first-set, etc. The code will traverse your dependency tree and execute them in that order, and make sure it doesn't do a task twice.
Even though my analysis code is in python I prefer to make sh
calls and then use text-files to pass info between different processes. This also facilitates re-entrant techniques, ie. if the result text file is already present then pass on the rest of the code ... I then use a -f
flag to force re-analysis if I have code changes.
Here is a link to an example of one of my paver-files: http://github.com/JudoWill/flELM/blob/master/pavement.py
If you've got a lot of time I also like to use the Galaxy FrameWork. They have a wonderful framework for integrating your own analysis pipelines. I have a local fork of their bit-bucket repository. With a little work you can integrate any pipeline into this framework and then you even have a nice GUI to interact with. Once you get the hang of their code base you can integrate one of your tools into the framework into an afternoon.
My general workflow is to build a tool using the paver framework since its much easier to debug. Once its finished I'll integrate it into the Galaxy framework.
This sort of reproducibility has made my professional life soooo much easier. When researchers come to me with new projects I can return preliminary answers to them within a day based on tools I already have. Then based on their feedback I'll develop a custom set of tools for their project.
Hope that general rambling helps,
Will
I think Makefiles are just as suited to research as to production environments. Originally they were meant for production (translating source code).
What about Cyrille2? Are you still using/developing it?
Hey Egon. Stopped development on Cyrille2, nice for high throughput pipelines that do not change to often - too unwieldy for my current work - use this now: http://mfiers.github.com/Moa/ (again, my own software)
Ah, and in Wageningen, the last instance of Cyrrille2 is about to get scrapped...