how to add "wildcard-specific wildcard" via snakemake checkpoints
1
0
Entering edit mode
2.4 years ago
TPOB ▴ 10

I can't find appropriate words to describe my need, please see the code.
My ideal workflow would do this:
I have known the categories that will be created during the workflow (cates)
I don't know how many files and what files would be created in each category (files)
For each category, rule create_file will be run first.
And then the checkpoint is triggered, I will know what files have been created for each category.
Then, for each file created in the create_file rule, a mock rule append_to_file_name take the file as input, and do the operation.

Files produced in create_file is wildcard specific so I call my need as "wildcard-specific" wildcard

cates=["A", "B"]
# pretend that you don't know about the files about to be created
files={
    "A": ["a.txt", "b.txt"],
    "B": ["c.txt", "d.txt"]
}

def get_append_to_file_output(wildcards):
    files = glob_wildcards(f"{wildcards.cate}/{{file}}.txt").sample
    appened = expand(f"{wildcards.cate}/{{file}}_append.txt", file = files)
    return appened


rule all:
    input:
        get_append_to_file_output,


checkpoint create_file:
    output: ddd=directory("{cate}"),
    run:
        from pathlib import Path
        Path(output.ddd).mkdir(parents=True, exist_ok=True)
        for file in files[wildcards.cate]:
            Path(file).touch()


rule append_to_file_name:
    input: ddd="{cate}",
    output: "{cate}/{file}_append.txt",
    run:
        from pathlib import Path
        Path(output[0]).touch()

I've asked the same question on stackoverflow

wdl snakemake workflow • 1.1k views
ADD COMMENT
0
Entering edit mode

How is this related to bioinformatics?

ADD REPLY
0
Entering edit mode

snakemake is a commonly used workflow language in bioinformatics

ADD REPLY
0
Entering edit mode

True, but if this is a pure snakemake related question with no biological context, it does not belong on this forum.

ADD REPLY
0
Entering edit mode

Ok, next time I will consider if the question suits the community or add some more context. In fact the real world problem related to this simplified version of example is a id convert and then mafft workflow.

And, do I need to delete this post now?

ADD REPLY
0
Entering edit mode

No, it can remain. Please accept your answer to mark it as solved.

And yes, if there is any biological context, please add it in. The people (including future you) who will search Biostars for this problem will approach it from the biology end and your context will help a lot.

ADD REPLY
1
Entering edit mode
2.4 years ago
TPOB ▴ 10

I've solved this by adding a fake aggregating rule.

cates=["A", "B"]
# pretend that you don't know about the files about to be created
files={
    "A": ["a", "b"],
    "B": ["c", "d"]
}

def get_append_to_file_output(wildcards):
    cate_dir = checkpoints.create_file.get(**wildcards).output.ddd
    files = glob_wildcards(f"{cate_dir}/{{file}}.txt").file
    print(files)
    appened = expand(f"results/append_filename/{wildcards.cate}/{{file}}_append.txt", file = files)
    return appened


rule all:
    input:
        expand("results/flags/{cate}_aggregate.flag", cate=cates)


checkpoint create_file:
    output: 
        ddd=directory("results/create_file/{cate}"),
    run:
        from pathlib import Path
        Path(output.ddd).mkdir(parents=True, exist_ok=True)
        for file in files[wildcards.cate]:
            Path(f"{output.ddd}/{file}.txt").touch()


rule append_to_filename:
    input: 
        ddd=rules.create_file.output.ddd
    output: 
        appended="results/append_filename/{cate}/{file}_append.txt"
    shell:
        "cp {input.ddd}/{wildcards.file}.txt {output.appended}"

rule fake_aggregate:
    input: 
        get_append_to_file_output
    output: 
        touch("results/flags/{cate}_aggregate.flag")
ADD COMMENT

Login before adding your answer.

Traffic: 1122 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6