Basically, I have three snakemake rules (other than rule all) and cannot figure this problem out, despite the checkpoint resources.
Rule one has my first and only file that I start with. It will have x outputs (the number varies depending on the input file). Each of those x outputs needs to be processed separately in rule 2, meaning that rule 2 will run x jobs. However, only some subset, y, of these jobs will produce outputs (the software only writes out files for inputs that pass a certain threshold). So, while I want each of those outputs to run as a separate job in job 3, I don't know how many files will come out of rule 2. Rule three will also run y jobs, one for each successful output from rule 2. I have two questions. The first is how do I write the input for rule 3, not knowing how many files will come out of rule two? The second question is how can I "tell" rule 2 it is done, when there is not a corresponding number of output files to the input files? If I add a fourth rule, I imagine it would try to re-run rule two on jobs that didn't get an output file, which would never make an output. Maybe I am missing something with setting up the checkpoints?
something like:
#this rule has one job
rule a:
input: file.vcf
output: some unkown number of files
shell:"""
.... make unknown number of output files (x) x_1 , x_2, ..., x_n
"""
#run a separate job from each output of rule a
rule b:
input: x_1 #not sure how many are going to be inputs here
output: y_1 #not sure how many output files will be here
shell:"""
some of the x inputs will output their corresponding y, but others will have no output
"""
#run a separate job in rule c for each output of rule b
rule c:
input: y_1 #not sure how many input files here
output: z_1
Input functions might be what you are looking for. There is a lot on there is you google "dynamic snakemake input" etc. in my experience snakemake was actually pretty limited until I learned how to make input functions