I have the following snakefile:
(some definitions) wildcard_constraints: cell_name = "S[a-zA-Z0-9]*"
import glob
from pathlib import Path
def cell_data_fun(wildcards):
cell_data_names = [Path(f).name.replace('.fastq.gz', '') for f in glob.glob('data/{wildcards.dataset_name}/raw/*.fastq.gz')]
return expand("data/{wildcards.dataset_name}/raw/{cell_data}_trimmed.fastq.gz", cell_data = cell_data_names)
rule all:
input:
cell_data_fun
output:
"data/{dataset_name}/run.log"
shell:
"""
echo {input};
"""
rule quality_control:
input:
fastqc_path = "data/{dataset_name}/raw/{cell_name}.fastq.gz",
bin_path = fastqc_bin_path,
trimmed_fastqc_path = "data/{dataset_name}/raw/{cell_name}_trimmed.fastq.gz"
output:
report_path = "data/{dataset_name}/results/reports/{cell_name}_trimmed_fastqc.html"
threads: workflow.cores * 0.75
shell:
"{input.bin_path} {input.trimmed_fastqc_path} -t {threads} --outdir=data/{wildcards.dataset_name}/results/reports/"
rule trim_validation:
input:
fastqc_path = "data/{dataset_name}/raw/{cell_name}.fastq.gz",
bin_path = trim_galore_bin_path
output:
trimmed_fastqc_path = "data/{dataset_name}/raw/{cell_name}_trimmed.fastq.gz",
report_path = "data/{dataset_name}/results/reports/{cell_name}_fastq.gz_trimmed_report.txt"
threads: 4
shell:
"""
{input.bin_path} {input.fastqc_path} --cores {threads};
touch {output.report_path}; mv {wildcards.cell_name}_trimmed.fastq.gz {output.trimmed_fastqc_path};
mv {wildcards.cell_name}.fastq.gz_trimmed_report.txt {output.report_path}
"""
I want to generate all files of the form data/{dataset_name}/raw/{cell_name}_trimmed.fastq.gz
by providing these files as an input to all
rule.
However, when I try to run the rule all
by
snakemake --cores 2 data/GSE77288/run.log
it just doesn't show any inputs in the logs:
What is the problem here? How I can produce the _trimmed.fastqc.gz
files dynamically dependent on the wildcards with an input function?