Question

Using 'expand' in snakemake target rule with constraints

0

Entering edit mode

3.5 years ago

cfos4698 ★ 1.1k

Dear all,

I'm working on a snakemake workflow where I need to use glob_wildcards (or something similar) to work on all samples in a directory.

SAMPLES = glob_wildcards(os.path.join(READSDIR,"{sample}_L001_R1_001.fastq.gz")).sample
wildcard_constraints:
    sample="(?!Undet).*"

rule all:
    input:
        expand(os.path.join(RESULT_DIR, "fastp/{sample}/trimmed/{sample}_trimmed_R1.fq.gz"), sample = SAMPLES),
        expand(os.path.join(RESULT_DIR, "fastp/{sample}/trimmed/{sample}_trimmed_R2.fq.gz"), sample = SAMPLES)

The input/output files for normal/work rules (sorry, don't know the proper name for them) populate correctly based on the {sample} wildcard. All rules finish as expected. However, I expect there to be some samples in the directory that I don't want. I can get around this by adding a global wildcards constraint at the beginning like so:

SAMPLES = glob_wildcards(os.path.join(READSDIR,"{sample}_L001_R1_001.fastq.gz")).sample
wildcard_constraints:
    sample="(?!Undet).*"

However, the issue then is a 'MissingInputException':

Missing input files for rule all:
results2/fastp/Undetermined_S0/trimmed/Undetermined_S0_trimmed_R1.fq.gz
results2/fastp/Undetermined_S0/trimmed/Undetermined_S0_trimmed_R2.fq.gz

How can I change the expand function for the target rule input so that it behaves in the same way as the wildcards_constraint (i.e., ignoring files beginning with 'Undet')?

Thanks!

snakemake glob_wildcards expand • 2.2k views

ADD COMMENT • link 3.5 years ago by cfos4698 ★ 1.1k

score 3 · Accepted Answer · 2021-07-21

3

Entering edit mode

3.5 years ago

Jianyu ▴ 580

SAMPLES = glob_wildcards(os.path.join(READSDIR,"{sample, (?!Undet).*}_L001_R1_001.fastq.gz")).sample

ADD COMMENT • link 3.5 years ago by Jianyu ▴ 580

0

Entering edit mode

Thanks, works a charm. I'd gone down the rabbit hole of:

expand(os.path.join(RESULT_DIR, "fastp/{sample}/trimmed/{sample}_trimmed_R1.fq.gz"), sample = [x for x in SAMPLES if 'Undet' not in x])

And it seemed to work, but I prefer the neatness of yours.

ADD REPLY • link 3.5 years ago by cfos4698 ★ 1.1k