verifyBAMID in snakemake script error
0
0
Entering edit mode
12 weeks ago
Peter Chung ▴ 210

I am new to snakemake, I have around 1000 samples and I tried to use config file and snakemake script to parallel run the verifyBAMID function, and there is an error and I don't know the error.

config.yaml

samples:
  120001:
    vcf: "/sample/120001.vcf.gz"
    bam: "/sample/120001/120001.recal.bam"
    bai: "/sample/120001/120001.recal.bam.bai"
 120002:
   vcf: "/sample/120002.vcf.gz"
   bam: "/sample/120002/120002recal.bam"
   bai: "/sample/120002/120002.recal.bam.bai"
120004:  
   vcf: "/sample/120004.vcf.gz"
   bam: "/sample/120004/120004.recal.bam"
   bai: "/sample/120004/120004.recal.bam.bai"

snakemake script:

     import os
     import yaml

     # Load the configuration file
     config = yaml.safe_load(open("config.yaml"))

     OUTPUT_DIR = "/output/"

     # Rule to specify the final output needed for the workflow completion
     rule all:
         input:
             expand(os.path.join(OUTPUT_DIR, "{sample}"), sample=config["samples"].keys())

     # Rule to run VerifyBamID
     rule verifyBAMID:
         input:
             bam=lambda wildcards: config["samples"][wildcards.sample]['bam'],
             bai=lambda wildcards: config["samples"][wildcards.sample]['bai'],
             vcf=lambda wildcards: config["samples"][wildcards.sample]['vcf'],
             id=lambda wildcards: config["samples"][wildcards.sample]
         output:
             directory(os.path.join(OUTPUT_DIR, "{sample}"))
         shell:
             """
             mkdir -p {output} && \  # Create the output directory if it doesn't exist
             VerifyBamID \
               --bam {input.bam} \
               --vcf {input.vcf} \
               --smID {input.id} \
               --out {output}/{wildcards.sample} \
               --best 2>/dev/null
             """

when I dry run

 Building DAG of jobs...
InputFunctionException in rule verifyBAMID in file /bin/ConfigVerifyBamID.smk, line 15:
Error:
  KeyError: 'sample/120004.vcf.gz'
Wildcards:
  sample=sample/120004.vcf.gz
Traceback:
  File "/bin/ConfigVerifyBamID.smk", line 17, in <lambda>

can anyone advice ? thanks.

config snakemake bam vcf verifyBAMID • 242 views
ADD COMMENT
0
Entering edit mode

It looks like somehow the input file paths are getting used instead of the sample names themselves in your all rule, though I can't see quite how given what you have. (Is that definitely the correct white space in your config.yaml file? I can't even get that to parse with the decreasing indentation for each sample.) But even if that's fixed you'll also run into another problem since config["samples"][wildcards.sample] will look for a value with a string as the key but the YAML will parse those samples as integers, and you'll get things like KeyError: '120001' instead.

(Personally, I'd simplify the whole thing to just infer input paths directly in your verifyBAMID rule so you wouldn't have to use input functions and reference config structures and all that. If you want a simple working example to build on you could just have the sample names in a list and give that to the expand call, and worry about adding a full configuration file, if you even need it, later on.)

ADD REPLY

Login before adding your answer.

Traffic: 2521 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6