I can't find appropriate words to describe my need, please see the code.
My ideal workflow would do this:
I have known the categories that will be created during the workflow (cates)
I don't know how many files and what files would be created in each category (files)
For each category, rule create_file will be run first.
And then the checkpoint is triggered, I will know what files have been created for each category.
Then, for each file created in the create_file rule, a mock rule append_to_file_name take the file as input, and do the operation.
Files produced in create_file is wildcard specific so I call my need as "wildcard-specific" wildcard
cates=["A", "B"]
# pretend that you don't know about the files about to be created
files={
"A": ["a.txt", "b.txt"],
"B": ["c.txt", "d.txt"]
}
def get_append_to_file_output(wildcards):
files = glob_wildcards(f"{wildcards.cate}/{{file}}.txt").sample
appened = expand(f"{wildcards.cate}/{{file}}_append.txt", file = files)
return appened
rule all:
input:
get_append_to_file_output,
checkpoint create_file:
output: ddd=directory("{cate}"),
run:
from pathlib import Path
Path(output.ddd).mkdir(parents=True, exist_ok=True)
for file in files[wildcards.cate]:
Path(file).touch()
rule append_to_file_name:
input: ddd="{cate}",
output: "{cate}/{file}_append.txt",
run:
from pathlib import Path
Path(output[0]).touch()
I've asked the same question on stackoverflow
How is this related to bioinformatics?
snakemake is a commonly used workflow language in bioinformatics
True, but if this is a pure snakemake related question with no biological context, it does not belong on this forum.
Ok, next time I will consider if the question suits the community or add some more context. In fact the real world problem related to this simplified version of example is a id convert and then mafft workflow.
And, do I need to delete this post now?
No, it can remain. Please accept your answer to mark it as solved.
And yes, if there is any biological context, please add it in. The people (including future you) who will search Biostars for this problem will approach it from the biology end and your context will help a lot.