Hello,
I am trying to develop an automated SNP calling pipeline from Raw Fastq to a final VCF. I am currently working in a cluster environment operated by Kubernetes
. So, for each kind of tool, I need to create a .yaml file with the Docker image
to deploy that tool as a Kubernetes
Pod. It looks like the below for GATK
(it is not the complete .yaml file though).
apiVersion: apps/v1
kind: Deployment
metadata:
name: gatk
selector:
matchLabels:
app: gatk
template:
metadata:
labels:
app: gatk
spec:
containers:
- name: gatk
image: "broadinstitute/gatk"
imagePullPolicy: IfNotPresent
resources:
requests:
cpu: '24'
memory: '120Gi'
limits:
cpu: '24'
memory: '120Gi'
So, for each step where I need to use tools like bwa, samtools, bcftools
etc. I need to create a separate .yaml file with the Docker container to start the pod separately.
I was wondering whether there is a Docker image
that contains several bioinformatics tools like this so that I can run them on one pod. If you know, then please let me know.
Alternatively, is it possible to include multiple Docker images
in one .yaml file so that I can run tools like the above being on one pod?
Thank you very much for the answer. I agree with you to stick to the already established workflow. I will try to build my own Docker than. Besides, would it be possible for you to suggest a GATK-based SNP calling workflow as there are hundreds?
That is the whole point: You do not need to build your own Docker, if you use an established workflow (manager).
For Snakemake, Varlociraptor would be my choice, although not GATK based. I don't know any other, but just googling "Snakemake + GATK" brought up five or six workflows, including this.
For Nextflow, Sarek is probably the best you can find. Also mind the more than 1000 bioinformatic tools already available as easy to use modules in nf-core, should you ever decide to write your own Nextflow pipeline.
Thank you very much for all the suggestions. Much appreciated.