snakemake MissingRuleException error!
1
1
Entering edit mode
2.1 years ago

When I use snakemake, I try to calculate md5 value for several files. The files are as below:

$ ls -l|cut -d" " -f1,2,5-
-rw-r--r-- 1   0 Oct 21 19:46 A.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 B.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 C.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 D.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 E.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 F.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 G.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 H.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 I.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 J.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 K.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 L.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 M.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 N.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 O.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 P.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 Q.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 R.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 S.fastq.gz
-rw-r--r-- 1 131 Oct 21 19:46 test.py
-rw-r--r-- 1   0 Oct 21 19:46 T.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 U.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 V.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 W.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 X.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 Y.fastq.gz
-rw-r--r-- 1   0 Oct 21 19:46 Z.fastq.gz

So I created a Snakefile file called test.py:

rule md5:
    input:
        "{sample}.fastq.gz"
    output:
        "{sample}.md5"
    shell:
        "md5sum {input} > {output}"

and then I run the code and got the error as below:

$ snakemake -c -s test.py *gz
Building DAG of jobs...
MissingRuleException:
No rule to produce I.fastq.gz (if you use input functions make sure that they don't raise unexpected exceptions).

I don't know why this error occurs and how to solve this.

I would be appreciated if anyone could help me figure out it.

Thanks a lot.

bash linux MissingRuleException snakemake • 1.1k views
ADD COMMENT
3
Entering edit mode
2.1 years ago
hugo.avila ▴ 530

Hello, you could pass the output instead of the input and let snakemake figure it out where the fastq file is located:

# -n is for a dryrun and -p is for snake to print the command line
snakemake -c -s test.py A.md5 -np

But this will work only for the A.fastq.gz, to run all you could do some shell string manipulation:

snakemake -c -s test.py $(ls *gz | sed -r 's:\..+::;s:$:.md5:')

This works but it is not very flexible, use glob_wildcards and a rule all. Edit the test.py file:

 from pathlib import Path

sample_dir = Path('.') # local dir 

SAMPLES, = glob_wildcards(sample_dir / '{sample}.fastq.gz') # Extract the name of the samples to create wildcards.

rule all:
    input:
        md5s_files=expand("{sample}.md5", sample=SAMPLES) # force the creation of the files 


rule md5:
    input:
        "{sample}.fastq.gz"
    output:
        "{sample}.md5"
    shell:
        "echo md5sum {input} > {output}"

then run:

snakemake -c -s test.py -np

You can further improve your code by passing the samples_dir variable as an argument inside a config.yaml file or through the command line.

ADD COMMENT
1
Entering edit mode

Many thanks!!!

I understand why I was wrong.

Thanks very much!

ADD REPLY

Login before adding your answer.

Traffic: 1810 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6