From the snakemake website:
Build systems like GNU Make are frequently used to create complicated workflows, e.g. in bioinformatics. This project aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment, together with a clean and modern domain specific specification language (DSL) in python style:
rule targets: input: 'plots/dataset1.pdf', 'plots/dataset2.pdf' rule plot: input: 'raw/{dataset}.csv' output: 'plots/{dataset}.pdf' shell: 'somecommand {input} {output}'
Like with GNU Make, in Snakemake you first specify targets in terms of a pseudo rule, and then how they are created via one or more steps of subsequent rule applications. Rules can be generalized via wildcards (here {dataset}). Everything is propagated top-down, i.e. here Snakemake determines that for the file "plots/dataset1.pdf" the rule plot has to be applied with wildcard {dataset} = dataset1 to the file raw/dataset1.csv. How the files are created is specified either with a shell command or python code. Further, Snakemake can interface with R to specify R code inside rules. Also see the FAQ to get an impression of the basic idea behind Snakemake.
Simple introduction, with example/comparison of the same pipeline made with both perl and snakemake here: https://bitbucket.org/johanneskoester/snakemake/wiki/Getting%20Started%20with%20Snakemake%20and%20Qsub This tool is excellent.
Too bad it only works with python 3. Does anyone have a solution to make it work without having to get python 3?
I don't have a solution to that, but a very easy way to install python 3 locally without needing root access, is the pyenv program.
After installing pyenv, you can install it with e.g. (given that 3.4.1 is the version you want to install):
... and then activate that version in your local shell:
... or globally with:
a snakemake solution here: http://coderscrowd.com/app/codes/view/192
Too bad it only works with python 3. Does anyone have a solution to make it work without having to get python 3?
Unfortunately, python 2 is not possible because of missing functionality in the multiprocessing module. However, having a python 3 setup in your home directory is very easy with virtualenv, or even without, using ~/.local as a prefix.
An example of how to do this might be worth a FAQ entry since I think python3 is one of the main stumbling blocks to greater uptake.
I have added the virtualenv setup to the documentation
Cool, I would propose adding documentation for setup with pyenv, since it takes away a lot of the complexity with the virtualenv / virtualenvwrapper / virtualenv-burrito mess.
Anaconda is an excellent and completely free Python distribution. It installs python 3 with all those sometimes hard to install mathematical tools like numpy, scipy, matplotlib and a few others.
*Installs cleanly into a single directory
*Doesn’t require root or local admin privileges
*Doesn’t affect other Python installs on your system, or interfere with OS X Frameworks
https://store.continuum.io/cshop/anaconda/
Great! I'll have to try that out
What's wrong with the good old robust GNU Make? Looks I have to type less there, while having the same functionality? Any main advantages of snakemake that I miss?
Much easier to code, able to easily use python (and all the libraries included) in the makefile (but also shell scripting), possible with several input output files for each rule (even if the exact number not known in advance), and is easy to generalize so it works with many different datasets: https://bitbucket.org/johanneskoester/snakemake/wiki/Documentation#markdown-header-wildcards
Snakemake is designed to work in a cluster environment. I routinely run snakemake over 3000 cores.
Interested in learning more about #SNAKEMAKE?
Register now for the first 2-day #SNAKEMAKE Workshop in Berlin with Johannes Köster https://johanneskoester.bitbucket.io/
You will learn how to create modern and reproducible #bioinformatic workflows