snakemake error: MissingInputException
1
0
Entering edit mode
3.3 years ago

Hello

I am writing a pipeline using snakemake and in the third step it ends up with an error: Building DAG of jobs... MissingInputException in line 5 of /storage/databases/snakemake: Missing input files for rule all: /storage/Ldec_out/align_hg38_SRR13510812.sam

My code looks like this:

configfile: "config.yaml"
SAMPLES, = glob_wildcards("/Ldec/samples/{sample}_1.fastq")
READS=["1", "2"]

rule all: input: expand(config["out"] + "/{sample}_1.fastq", sample=SAMPLES), expand(config["out"] + "/{sample}_2.fastq", sample=SAMPLES), \
                 expand(config["out"] + "/{sample}_1.paired.fastq", sample=SAMPLES), expand(config["out"] + "/{sample}_1.unpaired.fastq", sample=SAMPLES), \
                 expand(config["out"] + "/{sample}_2.paired.fastq", sample=SAMPLES), expand(config["out"] + "/{sample}_2.unpaired.fastq", sample=SAMPLES), \
                 expand(config["out"] + "/align_hg38_{sample}.sam", sample=SAMPLES)
rule rename:
    input: data1= config["samples"] + "{sample}_1.fastq", data2=config["samples"] + "{sample}_2.fastq"
    output: output1=config["out"] + "{sample}_1.fastq", output2=config["out"] + "{sample}_2.fastq"
    run: 
        shell(""" awk  '{{print (NR%4 == 1) ? "@" x "_" ++i "/1" : $0}}' {input.data1} > {output.output1} """)
        shell("""awk  '{{print (NR%4 == 1) ? "@" x "_" ++i "/2" : $0}}' {input.data2} > {output.output2}""")

rule trimmomatic:
    input: data1=config["out"]+"{sample}_1.fastq", data2=config["out"]+"{sample}_2.fastq"
    output: output1=config["out"] + "{sample}_1.paired.fastq", output2=config["out"] + "{sample}_1.unpaired.fastq", output3=config["out"] + "{sample}_2.paired.fastq", output4=config["out"] + "{sample}_2.unpaired.fastq"
    shell: "trimmomatic PE {input.data1} {input.data2} {output.output1} {output.output2} {output.output3} {output.output4} ILLUMINACLIP:ref/NexteraPE-PE.fa:2:30:10:1:true SLIDINGWINDOW:6:10 LEADING:13 TRAILING:13 MINLEN:36" 

rule bwa_hg38:
        input: data1=config["ref"]+"GCF_000001405.38_GRCh38.p12_genomic.fna", data2=config["out"]+"{sample}_1.paired.fastq", data3=config["out"]+"{sample}_2.paired.fastq"
        output: config["out"] + "align_hg38_{sample}.sam"
        shell: "bwa mem -P {input.data1} {input.data2} {input.data3} > {output}"

And here is my config-file:

samples: /Ldec/samples
out: /storage/Ldec_out
ref: /storage/ref

Any help would be greatly appreciated.

pipeline snakemake • 1.4k views
ADD COMMENT
2
Entering edit mode
3.3 years ago
Jianyu ▴ 580

Looks like your bwa_hg38 rule misses the "/" in output file name?

 config["out"] + "**/**align_hg38_{sample}.sam"

I just found all rules except the first one miss the "/" between directory and file name in output

ADD COMMENT
0
Entering edit mode

Thanks a lot, it helped!

Nevertheless, I don’t understand why some rules need to specify '/', but some do not. Is there any explanation for this?

ADD REPLY
0
Entering edit mode

Without '/', the output file name in bwa_hg38 rule would be "/storage/Ldec_outalign_hg38_SRR13510812.sam", obviously it doesn't match the required file you specified in rule all.

PS: Snakemake requires the required file's name should exactly match the output file name specified in rule, it won't try to search for the file that exists in the same path while described in a different way (i.e., "a/b/c.txt" != "a/b//c.txt")

ADD REPLY

Login before adding your answer.

Traffic: 2622 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6