Docker container with multiple SNP calling tools
1
1
Entering edit mode
10 months ago

Hello,

I am trying to develop an automated SNP calling pipeline from Raw Fastq to a final VCF. I am currently working in a cluster environment operated by Kubernetes. So, for each kind of tool, I need to create a .yaml file with the Docker image to deploy that tool as a Kubernetes Pod. It looks like the below for GATK (it is not the complete .yaml file though).

apiVersion: apps/v1
 kind: Deployment
 metadata:
   name: gatk
   selector:
     matchLabels:
       app: gatk
   template:
     metadata:
       labels:
         app: gatk
     spec:
       containers:
         - name: gatk
           image: "broadinstitute/gatk"
           imagePullPolicy: IfNotPresent
           resources:
             requests:
               cpu: '24'
               memory: '120Gi'
             limits:
               cpu: '24'
               memory: '120Gi'

So, for each step where I need to use tools like bwa, samtools, bcftools etc. I need to create a separate .yaml file with the Docker container to start the pod separately.

I was wondering whether there is a Docker image that contains several bioinformatics tools like this so that I can run them on one pod. If you know, then please let me know.

Alternatively, is it possible to include multiple Docker images in one .yaml file so that I can run tools like the above being on one pod?

Docker SNP Genotyping • 993 views
ADD COMMENT
4
Entering edit mode
10 months ago

You can always build your own Docker image that suits exactly your needs. See e.g. Kogia for a collection of Dockerfiles that build tiny images for some bioinformatic tools. Maybe all your tools are already there, so you can just copy-paste the respective commands into one Dockerfile to create your combined container.

If all your desired tools are on Bioconda, you can build mulled containers with multiple tools easily.

However, my actual recommendation would be to not reinvent the wheel and rather use an established workflow executor / pipeline framework that has support for Kubernetes, and it will take care of all those nitty-gritty details for you. Nextflow has Kubernetes support, Snakemake does and Prefect, Dagster, Flyte as well (and probably a couple of hundreds more). Many high-quality variant calling workflows have been published already, most using Snakemake or Nextflow, so you could either use those or at least draw significant inspiration from their implementation.

ADD COMMENT
0
Entering edit mode

Thank you very much for the answer. I agree with you to stick to the already established workflow. I will try to build my own Docker than. Besides, would it be possible for you to suggest a GATK-based SNP calling workflow as there are hundreds?

ADD REPLY
0
Entering edit mode

That is the whole point: You do not need to build your own Docker, if you use an established workflow (manager).

For Snakemake, Varlociraptor would be my choice, although not GATK based. I don't know any other, but just googling "Snakemake + GATK" brought up five or six workflows, including this.

For Nextflow, Sarek is probably the best you can find. Also mind the more than 1000 bioinformatic tools already available as easy to use modules in nf-core, should you ever decide to write your own Nextflow pipeline.

ADD REPLY
0
Entering edit mode

Thank you very much for all the suggestions. Much appreciated.

ADD REPLY

Login before adding your answer.

Traffic: 2182 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6