Hi! I am having some issues combining docker with snakemake workflows.
I have some rules that involve running an R script on the input(s) and producing some output(s). Until now, I've also been able to solve the environments for these R scripts with conda environments , however, I'm now at a point where conda envs are not cutting it (i.e. the libraries are not available on conda repositories, or the dependencies between them are just impossible to solve unless I do it). So, I turned to docker.
Suppose I have an R script that uses the libraries plotly and dplyr, so I create a docker image using R base as base and install the libraries in it.
# Use an official R runtime as a parent image
FROM r-base:latest
# Install necessary R packages or any other dependencies if needed
RUN R -e "install.packages('plotly')"
RUN R -e "install.packages('dplyr')"
Then I build the docker image:
docker build -t my-r-image .
Tag it:
docker tag my-r-image hamarillo/my-r-image:dev
Push it:
docker push hamarillo/my-r-image:dev
And finally use it in my rule:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
rule generate_plotly_visualizations:
'''
Rule for sophisticated R-based visualization using plotly
'''
input:
processed_data="results/analysis/processed_data.csv"
output:
plotly_visualization="results/visualizations/plotly_visualization.html"
container:
"docker://hamarillo/my-r-image:dev"
script:
"../scripts/generate_plotly_visualization.R"
Now, things work out fine, snakemake downloads and builds the image just fine.
However, my script doesn't run and the first error is that the libraries are not available, specifically the first library call library(plotly)
, so it definitely fails from the beginning.
Now, I think the problem is that snakemake activates the container, and the first thing it does when you use the script:
directive with an R script is Rscript --vanilla the_script.R
BUT, my container is already inside R because I used an r-base image to create it. So it makes no sense to do Rscript --vanila the_script.R
RuleException:
CalledProcessError in file containers_test/workflow/rules/plotly_visualizations.smk, line 16:
Command ' singularity exec --home 'containers_test' containers_test/.snakemake/singularity/411cdb23c5e82208fe7d71e579e251cb.simg bash -c 'set -euo pipefail; Rscript --vanilla containers_test/.snakemake/scripts/tmpitwfgta6.generate_plotly_visualization.R'' returned non-zero exit status 1.
Please help. Has anyone used docker containers with snakemake before for one specific rule (not one container for the whole workflow) to run R scripts?
I think I could try to build my image using something other than r-base (e.g. ubuntu or debian) and then install R in it, and then install my libraries in it, and I'm actually trying that out next, but I was wondering if it can be helped (I'd like to keep the image as small as possible).
Thanks for reading!!! and any help is appreciated
Maybe try setting a different entry point for your image? That way, your image will expose a shell where you can call Rscript.
Please show the exact Snakefile, command to run snakemake and the precise error message.