Snakemake input function results in a rule with no inputs
2
1
Entering edit mode
2.1 years ago
onurcanbkts ▴ 30

I have the following snakefile:

(some definitions) wildcard_constraints: cell_name = "S[a-zA-Z0-9]*"

import glob
from pathlib import Path

def cell_data_fun(wildcards):
    cell_data_names = [Path(f).name.replace('.fastq.gz', '') for f in glob.glob('data/{wildcards.dataset_name}/raw/*.fastq.gz')]
    return expand("data/{wildcards.dataset_name}/raw/{cell_data}_trimmed.fastq.gz", cell_data = cell_data_names)

rule all:
    input:
        cell_data_fun
    output:
        "data/{dataset_name}/run.log"
    shell:
        """
        echo {input};
        """

rule quality_control:
    input:
        fastqc_path = "data/{dataset_name}/raw/{cell_name}.fastq.gz",
        bin_path = fastqc_bin_path,
        trimmed_fastqc_path = "data/{dataset_name}/raw/{cell_name}_trimmed.fastq.gz"
    output:
        report_path = "data/{dataset_name}/results/reports/{cell_name}_trimmed_fastqc.html"
    threads: workflow.cores * 0.75
    shell:
        "{input.bin_path} {input.trimmed_fastqc_path} -t {threads} --outdir=data/{wildcards.dataset_name}/results/reports/"

rule trim_validation:
    input:
        fastqc_path = "data/{dataset_name}/raw/{cell_name}.fastq.gz",
        bin_path = trim_galore_bin_path
    output:
        trimmed_fastqc_path = "data/{dataset_name}/raw/{cell_name}_trimmed.fastq.gz",
        report_path = "data/{dataset_name}/results/reports/{cell_name}_fastq.gz_trimmed_report.txt"
    threads: 4
    shell:
        """
        {input.bin_path} {input.fastqc_path} --cores {threads};
        touch {output.report_path}; mv {wildcards.cell_name}_trimmed.fastq.gz {output.trimmed_fastqc_path};
        mv {wildcards.cell_name}.fastq.gz_trimmed_report.txt {output.report_path}
        """

I want to generate all files of the form data/{dataset_name}/raw/{cell_name}_trimmed.fastq.gz by providing these files as an input to all rule. However, when I try to run the rule all by

snakemake --cores 2 data/GSE77288/run.log

it just doesn't show any inputs in the logs:

enter image description here

What is the problem here? How I can produce the _trimmed.fastqc.gz files dynamically dependent on the wildcards with an input function?

snakemake • 956 views
ADD COMMENT
2
Entering edit mode
2.1 years ago
onurcanbkts ▴ 30

Found the answer.

In the input function cell_data_fun, the variable wildcards.dataset_name should be used like a normal python variable, i.e

def cell_data_fun(wildcards):
    cell_data_names = [Path(f).name.replace('.fastq.gz', '') for f in glob.glob("data/" + wildcards.dataset_name + "/raw/*.fastq.gz")]
    return expand("data/" + wildcards.dataset_name + "/results/reports/{cell_data}_trimmed_fastqc.html", cell_data = cell_data_names)
ADD COMMENT
1
Entering edit mode
2.1 years ago
bas1993 ▴ 60

I see you found the answer yourself, but below is how I format the wildcards names in snakefiles (before you define your rules) in case you find it useful. In this way you can also use the wildcard as {sample} in all of your rules.

import glob
(SAMPLES,STRANDS) = glob_wildcards("samples/{sample}_{strand}.fastq.gz")
STRANDS = ["R1","R2"]
ADD COMMENT

Login before adding your answer.

Traffic: 2207 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6