Dear all,
I'm working on a snakemake workflow where I need to use glob_wildcards (or something similar) to work on all samples in a directory.
SAMPLES = glob_wildcards(os.path.join(READSDIR,"{sample}_L001_R1_001.fastq.gz")).sample
wildcard_constraints:
sample="(?!Undet).*"
rule all:
input:
expand(os.path.join(RESULT_DIR, "fastp/{sample}/trimmed/{sample}_trimmed_R1.fq.gz"), sample = SAMPLES),
expand(os.path.join(RESULT_DIR, "fastp/{sample}/trimmed/{sample}_trimmed_R2.fq.gz"), sample = SAMPLES)
The input/output files for normal/work rules (sorry, don't know the proper name for them) populate correctly based on the {sample} wildcard. All rules finish as expected. However, I expect there to be some samples in the directory that I don't want. I can get around this by adding a global wildcards constraint at the beginning like so:
SAMPLES = glob_wildcards(os.path.join(READSDIR,"{sample}_L001_R1_001.fastq.gz")).sample
wildcard_constraints:
sample="(?!Undet).*"
However, the issue then is a 'MissingInputException':
Missing input files for rule all:
results2/fastp/Undetermined_S0/trimmed/Undetermined_S0_trimmed_R1.fq.gz
results2/fastp/Undetermined_S0/trimmed/Undetermined_S0_trimmed_R2.fq.gz
How can I change the expand
function for the target rule input so that it behaves in the same way as the wildcards_constraint (i.e., ignoring files beginning with 'Undet')?
Thanks!
Thanks, works a charm. I'd gone down the rabbit hole of:
And it seemed to work, but I prefer the neatness of yours.