Hello
I am writing a pipeline using snakemake and in the third step it ends up with an error: Building DAG of jobs... MissingInputException in line 5 of /storage/databases/snakemake: Missing input files for rule all: /storage/Ldec_out/align_hg38_SRR13510812.sam
My code looks like this:
configfile: "config.yaml"
SAMPLES, = glob_wildcards("/Ldec/samples/{sample}_1.fastq")
READS=["1", "2"]
rule all: input: expand(config["out"] + "/{sample}_1.fastq", sample=SAMPLES), expand(config["out"] + "/{sample}_2.fastq", sample=SAMPLES), \
expand(config["out"] + "/{sample}_1.paired.fastq", sample=SAMPLES), expand(config["out"] + "/{sample}_1.unpaired.fastq", sample=SAMPLES), \
expand(config["out"] + "/{sample}_2.paired.fastq", sample=SAMPLES), expand(config["out"] + "/{sample}_2.unpaired.fastq", sample=SAMPLES), \
expand(config["out"] + "/align_hg38_{sample}.sam", sample=SAMPLES)
rule rename:
input: data1= config["samples"] + "{sample}_1.fastq", data2=config["samples"] + "{sample}_2.fastq"
output: output1=config["out"] + "{sample}_1.fastq", output2=config["out"] + "{sample}_2.fastq"
run:
shell(""" awk '{{print (NR%4 == 1) ? "@" x "_" ++i "/1" : $0}}' {input.data1} > {output.output1} """)
shell("""awk '{{print (NR%4 == 1) ? "@" x "_" ++i "/2" : $0}}' {input.data2} > {output.output2}""")
rule trimmomatic:
input: data1=config["out"]+"{sample}_1.fastq", data2=config["out"]+"{sample}_2.fastq"
output: output1=config["out"] + "{sample}_1.paired.fastq", output2=config["out"] + "{sample}_1.unpaired.fastq", output3=config["out"] + "{sample}_2.paired.fastq", output4=config["out"] + "{sample}_2.unpaired.fastq"
shell: "trimmomatic PE {input.data1} {input.data2} {output.output1} {output.output2} {output.output3} {output.output4} ILLUMINACLIP:ref/NexteraPE-PE.fa:2:30:10:1:true SLIDINGWINDOW:6:10 LEADING:13 TRAILING:13 MINLEN:36"
rule bwa_hg38:
input: data1=config["ref"]+"GCF_000001405.38_GRCh38.p12_genomic.fna", data2=config["out"]+"{sample}_1.paired.fastq", data3=config["out"]+"{sample}_2.paired.fastq"
output: config["out"] + "align_hg38_{sample}.sam"
shell: "bwa mem -P {input.data1} {input.data2} {input.data3} > {output}"
And here is my config-file:
samples: /Ldec/samples
out: /storage/Ldec_out
ref: /storage/ref
Any help would be greatly appreciated.
Thanks a lot, it helped!
Nevertheless, I don’t understand why some rules need to specify '/', but some do not. Is there any explanation for this?
Without '/', the output file name in bwa_hg38 rule would be "/storage/Ldec_outalign_hg38_SRR13510812.sam", obviously it doesn't match the required file you specified in rule all.
PS: Snakemake requires the required file's name should exactly match the output file name specified in rule, it won't try to search for the file that exists in the same path while described in a different way (i.e., "a/b/c.txt" != "a/b//c.txt")