I have project folder structure like Below : which has size of more than 50 GB .
When i am creating Dockerfile such that Snakefile workflow which utilizes data from these folders , should run inside Docker and snakemake command should be entrypoint .
├── adapters.fa
├── Dockerfile
├── genome
│ ├──
├── genomeIndex
│ ├──
├── RawReads
│ ├── sample_1.fastq.gz
│ └── sample_2.fastq.gz
├── RNAindex
│ ├──
└── Snakefile
How should i create Dockerfile and build Image , where i dont want to copy some folders like RawReads , genome , genomeIndex but can link to snakemake rules . Such that when i run docker container on command should run whole snakemake workflow and create results folders .
Sample Dockerfile and Snakefile are shown below :
Snakefile :
rule starAlignment:
input:
trimmed1="Trimmed/{id}_forward.fastq",
trimmed2="Trimmed/{id}_reverse.fastq"
output:
"starOut/{id}Unmapped.out.mate1",
"starOut/{id}Unmapped.out.mate2",
"starOut/{id}Aligned.sortedByCoord.out.bam",
"starOut/{id}Log.final.out"
params:
prefix="starOut/{id}"
threads: 20
shell:
"""
STAR --runThreadN {threads} --genomeLoad LoadAndKeep --genomeDir genomeIndex --readFilesIn {input.trimmed1} {input.trimmed2} --outFilterIntronMotifs RemoveNoncanonical --outFileNamePrefix {params.prefix} --limitBAMsortRAM 15000000000 --quantMode GeneCounts --outSAMtype BAM SortedByCoordinate --outReadsUnmapped Fastx
"""
Dockerfile :
FROM condaforge/mambaforge:22.9.0-3
RUN mamba install -c bioconda samtools bedtools fastqc multiqc trimmomatic bwa star picard rseqc subread snakemake
WORKDIR /app
ENTRYPOINT [ "snakemake"]
Thank you for your response, I would like to know , whether ENTRYPOINT is correct in the above Dockerfile ? , if i mount externaly , whether Entrypoint will identify data folders ?
Because once i build and run container I am getting output as Job done , but No input No output . Can you please give a proper guidance . Thank you
There are ample blog posts about the difference between
ENTRYPOINT
andCMD
in a Dockerfile.However, I do not understand why you would like to manually create a Dockerfile, if you can make use of the capabilities of Snakemake to do so? Since your workflow only uses tools on conda, you can just run
snakemake --containerize > Dockerfile
. Done.For performance reasons, I would, however, always opt to use separate containers for each tool, which you fortunately can do as easily with Snakemake.
Thank you for your response