Question

Snakemake not executing any tasks

0

Entering edit mode

24 months ago

jaime alvarez ▴ 30

Hello, I am trying the following Snakemake file but I can't understand why it is not executing any tasks. I have the following code:

SAMPLES, = glob_wildcards("FASTQ/{samp}_R1.fastq.gz")

rule all:
    input:
        expand("trimmed/{sample}_R1.fastq.gz", sample=SAMPLES),
        expand("trimmed/{sample}_R2.fastq.gz", sample=SAMPLES)

rule fastqc:
    input:
        expand("FASTQ/{sample}_R1.fastq.gz", sample=SAMPLES),
        expand("FASTQ/{sample}_R2.fastq.gz", sample=SAMPLES)
    output:
        expand("FASTQ/{sample}_R1_fastqc.html", sample=SAMPLES),
        expand("FASTQ/{sample}_R2_fastqc.html", sample=SAMPLES)
    threads:
        3
    log:
        "FASTQ/{sample}_R1.log",
        "FASTQ/{sample}_R2.log"
    shell:
        "fastqc --extract --nogroup {input[0]} {input[1]} -o fastqc"

rule cutadapt:
    input:
        forward_reads="FASTQ/{sample}_R1.fastq.gz",
        reverse_reads="FASTQ/{sample}_R2.fastq.gz"
    output:
        forward_reads="trimmed/{sample}_R1.fastq.gz",
        reverse_reads="trimmed/{sample}_R2.fastq.gz"
    threads:
        3
    log:
        "trimmed/{sample}_R1.log",
        "trimmed/{sample}_R2.log"
    params:
        adapter_forward="TAGATCTTGAGACAAATGGCA",
        adapter_reverse="TGCCATTTGTCTCAAGATCTA"
    shell:
        "cutadapt -a {params.adapter_forward} -A {params.adapter_reverse} -o {output.forward_reads} -p {output.reverse_reads} {input.forward_reads} {input.reverse_reads}"

The dry run is not giving me much information:

snakemake -n
Building DAG of jobs...
Job stats:
job      count    min threads    max threads
-----  -------  -------------  -------------
all          1              1              1
total        1              1              1


[Thu Jan  5 18:22:13 2023]
localrule all:
    jobid: 0
    reason: Rules with neither input nor output files are always executed.
    resources: tmpdir=/tmp

Job stats:
job      count    min threads    max threads
-----  -------  -------------  -------------
all          1              1              1
total        1              1              1

Reasons:
    (check individual jobs above for details)
    neither input nor output:
        all

This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

When I try to run it, its completing without doing anything:

Finished job 0.
1 of 1 steps (100%) done

python readqc Snakemake cutadapt • 1.6k views

ADD COMMENT • link updated 23 months ago by Jesse ▴ 850 • written 24 months ago by jaime alvarez ▴ 30

1

Entering edit mode

First line says {samp}, is that intentional rather than {sample}?

ADD REPLY • link updated 24 months ago by Ram 44k • written 24 months ago by ATpoint 86k

2

Entering edit mode

Shouldn't actually matter (that {samp} is already gone by the time SAMPLES is defined) but my money's on SAMPLES being empty. How about a print(SAMPLES) on the second line, Jaime? Are you sure you have files matching that pattern?

ADD REPLY • link 24 months ago by Jesse ▴ 850

0

Entering edit mode

Thank you Jesse, indeed, I wasn't capturing any samples with the pattern matching, because I have samples with names of the type:

FASTQ/sample1_R1_001.fastq.gz
FASTQ/sample1_R2_001.fastq.gz
FASTQ/sample2_R1_001.fastq.gz
FASTQ/sample2_R2_001.fastq.gz

I changed the pattern matching to (and added the print statement):

SAMPLES, = glob_wildcards("FASTQ/{samp}_R1_001.fastq.gz")
print(SAMPLES)

It is not working but I get the following error:

SyntaxError:

Not all output, log and benchmark files of rule fastqc contain the same wildcards. This is crucial though, in order to avoid that two or more jobs write to the same file.

The error is referring to line 22, which is the execution of the command:

"fastqc --extract --nogroup {input[0]} {input[1]} -o fastqc"

My logic is that I am matching now everything prior to the _R1/_R2 (without the directory), so in the example above: sample1, sample2, etc...

I can't see why it is not working, all files are paired (R1 and R2).

ADD REPLY • link 24 months ago by jaime alvarez ▴ 30

0

Entering edit mode

Your fastqc rule is written to use all R1 and R2 files as inputs and outputs, with the wildcards already gone by the time the rule sees the input and output lists. The expand calls use a wildcard internally but what it produces is just a big list of file paths. That's why it complains about wildcards; only your log entry actually uses wildcards as far as the rule is concerned.

I'd rewrite it to look like your cutadapt rule, working with a single R1+R2 pair at a time. And then, if you want a rule to produce all of the fastqc outputs, you can have another rule like you already have as "all", but asking for the fastqc paths instead of the trimmed ones. (I typically end up with a few different "all_<something>" rules to make it easy to get different sets of outputs, and maybe an "all" at the very top with everything included. You can store the lists of paths in variables for that so you don't have to repeat yourself too much.)

ADD REPLY • link 23 months ago by Jesse ▴ 850