Although this isn't a code review site, do you mind if I rewrite your workflow? I'll add some comments to cover any confusing bits that might be the source of your problem.
I can't tell from your code whether datapath
looks like "some/path/" or "some/path/some_prefix" (I've assumed the former), and as such I'd strongly recommend you use os.path.join(...)
or expand("{prefix}/R_RESULT/{barcode}_list_{genotype}.txt", ...)
to build your result paths more explicitly.
# You didn't define `datapath` and `barcodes`, so I'm assuming you've defined them in some
# config file so that your pipeline is reusable:
configfile: "my/config.yaml"
datapath = config["datapath"]
BARCODES = config["barcodes"]
# please put these into a config as well ...
GENOTYPES = ["GTA","GTB","GTC","GTD","GTE","GTF","GTG","GTH","GTI","GTJ"]
result_path = os.path.join(datapath, "R_RESULT")
blastn_path = os.path.join(datapath, "BLASTN")
# what does this rule tell the computer to do? For each combination of barcode and genotype
# that the user specifies, generate the file "<datapath>/R_RESULT/<barcode>_list_<genotype>.txt"
rule all:
input:
expand(
"{result_path}/{barcode}_list_{genotype}.txt",
result_path=result_path,
barcode=BARCODES,
genotype=GENOTYPES
)
# This rule tells the computer:
# If you want the file (singular) "<datapath>/R_RESULT/<barcode>_list_<genotype>.txt",
# then run this R-script and it will generate that file.
#
# Since you have specified that you're interested in 10 genotypes for each barcode (see your
# `rule all`), this rule will be ran 10 * |number of barcodes| times.
#
# But, for a given barcode, the input to this rule would be the same for each of those 10 runs,
# Should you have a {genotype} stub in your input filepath?
#
# Or, If this rule should run once for each barcode, but during that run it generates a file for each of the
# 10 genotypes, then you should change the output to
# `expand(join(datapath, "R_RESULT", "{{barcode}}_list_{genotype}.txt"), genotype = GENOTYPES)`
#
rule R_HBV_analysis:
input:
R_data = os.path.join(blastn_path, {barcode}_fmt.txt")
output:
R_result = os.path.join(result_path, "{barcode}_list_{genotype}.txt"),
wildcard_constraints:
# I hadn't seen lists of values passed in as wildcard_constraints before, I usually
# use `genotype = "|".join(GENOTYPES)` to build the constraint regex
# please check that the syntax is correct
genotype = GENOTYPES
params:
path = datapath,
result_path = result_path,
# I don't quite get why this path is `mkdir`ed; if it's contents are made by your script, you could
# add it as a `directory()` output of this rule.
#
# Nonetheless, I moved their definition to here, to reduce the duplication
# in your shell command
data_path = os.path.join(datapath, "RDATA")
shell:
"""
if [ ! -d "{params.data_path}" ];then
mkdir "{params.data_path}"
fi
if [ ! -d "{params.result_path}" ];then
mkdir "{params.result_path}"
fi
Rscript script/HBV_analysis.R {input} {params.path}
"""
Sorry, how does your R-script know which of the genotypes are to be present in the output file?
In fact, I'm counting the most represented genotype in my dataset. So, depending on the dataset, it could be A, or B etc .....
Then why not have a single output file name used by each run of your Rscript, and encode the mode-genotype within the file, rather than within the filename?
Well, you totally rigth! I didn't think about it at all. It should be more easier to handle this way with snakemake. Thanks for this advice and all of your tips on the second answer.