Entering edit mode
24 months ago
jaime alvarez
▴
30
Hello, I am trying the following Snakemake file but I can't understand why it is not executing any tasks. I have the following code:
SAMPLES, = glob_wildcards("FASTQ/{samp}_R1.fastq.gz")
rule all:
input:
expand("trimmed/{sample}_R1.fastq.gz", sample=SAMPLES),
expand("trimmed/{sample}_R2.fastq.gz", sample=SAMPLES)
rule fastqc:
input:
expand("FASTQ/{sample}_R1.fastq.gz", sample=SAMPLES),
expand("FASTQ/{sample}_R2.fastq.gz", sample=SAMPLES)
output:
expand("FASTQ/{sample}_R1_fastqc.html", sample=SAMPLES),
expand("FASTQ/{sample}_R2_fastqc.html", sample=SAMPLES)
threads:
3
log:
"FASTQ/{sample}_R1.log",
"FASTQ/{sample}_R2.log"
shell:
"fastqc --extract --nogroup {input[0]} {input[1]} -o fastqc"
rule cutadapt:
input:
forward_reads="FASTQ/{sample}_R1.fastq.gz",
reverse_reads="FASTQ/{sample}_R2.fastq.gz"
output:
forward_reads="trimmed/{sample}_R1.fastq.gz",
reverse_reads="trimmed/{sample}_R2.fastq.gz"
threads:
3
log:
"trimmed/{sample}_R1.log",
"trimmed/{sample}_R2.log"
params:
adapter_forward="TAGATCTTGAGACAAATGGCA",
adapter_reverse="TGCCATTTGTCTCAAGATCTA"
shell:
"cutadapt -a {params.adapter_forward} -A {params.adapter_reverse} -o {output.forward_reads} -p {output.reverse_reads} {input.forward_reads} {input.reverse_reads}"
The dry run is not giving me much information:
snakemake -n
Building DAG of jobs...
Job stats:
job count min threads max threads
----- ------- ------------- -------------
all 1 1 1
total 1 1 1
[Thu Jan 5 18:22:13 2023]
localrule all:
jobid: 0
reason: Rules with neither input nor output files are always executed.
resources: tmpdir=/tmp
Job stats:
job count min threads max threads
----- ------- ------------- -------------
all 1 1 1
total 1 1 1
Reasons:
(check individual jobs above for details)
neither input nor output:
all
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
When I try to run it, its completing without doing anything:
Finished job 0.
1 of 1 steps (100%) done
First line says
{samp}
, is that intentional rather than{sample}
?Shouldn't actually matter (that
{samp}
is already gone by the timeSAMPLES
is defined) but my money's onSAMPLES
being empty. How about aprint(SAMPLES)
on the second line, Jaime? Are you sure you have files matching that pattern?Thank you Jesse, indeed, I wasn't capturing any samples with the pattern matching, because I have samples with names of the type:
I changed the pattern matching to (and added the print statement):
It is not working but I get the following error:
The error is referring to line 22, which is the execution of the command:
My logic is that I am matching now everything prior to the _R1/_R2 (without the directory), so in the example above: sample1, sample2, etc...
I can't see why it is not working, all files are paired (R1 and R2).
Your fastqc rule is written to use all R1 and R2 files as inputs and outputs, with the wildcards already gone by the time the rule sees the input and output lists. The
expand
calls use a wildcard internally but what it produces is just a big list of file paths. That's why it complains about wildcards; only your log entry actually uses wildcards as far as the rule is concerned.I'd rewrite it to look like your cutadapt rule, working with a single R1+R2 pair at a time. And then, if you want a rule to produce all of the fastqc outputs, you can have another rule like you already have as "all", but asking for the fastqc paths instead of the trimmed ones. (I typically end up with a few different "all_<something>" rules to make it easy to get different sets of outputs, and maybe an "all" at the very top with everything included. You can store the lists of paths in variables for that so you don't have to repeat yourself too much.)