I am trying to run Clairvoyante using snakemake for multiple chromosome. the script looks like this:
chr_list = ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrX', 'chrY', 'chrM']
rule clairvoyante:
"""
Identify SNPs using Clairvoyante.
"""
input:
datain="{dataset}/{sample}.{ref}.{align}.{case}.bam"
output:
dataout= ["{{dataset}}/{{sample}}.{{ref}}.{{align}}.{{case}}.clarivoyante.{}.vcf".format(i) for i in chr_list]
params:
reference=my_ref,
training=TRAINING_DATA,
clairvoyante=CLARIVOYANTE,
minCoverage=1,
message:
"Running callvarbam from clairvoyante, sample is: {wildcards.sample}"
shell:
"""
clairvoyante.py callVarBam \
--chkpnt_fn "{params.training}" \
--bam_fn "{input.datain}" \
--ref_fn "{params.reference}" \
--call_fn "{output.dataout}" \
--minCoverage "{params.minCoverage}" \
--ctgName 1 \
"""
My issue is the tool calls variant on one chromosome only per time using --ctgName
parameter:
the easy way is to loop through the chr_list in shell and uses parallel or xargs to call snp on all of them ()I think this is not a convenient way) or maybe write a python function and use map
to run them in parallel.
but I think there should be a better way to do that using snakemake to pass the chromosome to --ctgName
for each output.
any idea?
Thanks, I will try your answer. Regarding the question I was doubting where to put it, but I thought it may fit here more.
Yeah I agree that it would make sense to use biostars, but see also the support section in the snakemake documentation.
Aha, now I understand, thanks that makes more sense for me now.