Within my snakemake file I have a function called SNAKEMAKE_OUTPUT
. Calling this function gives me three variables INPUT
, OUTPUT
,COMMANDS
. The first variable is a list, the latter two are dictionaries whose keys are within the first variable. For example.
SAMPLES = ["Mortimer"]
RESULTS = {"Mortimer" : ["Mort_out1.x","Mort_out2.x",...]
COMMANDS = {"Mortimer" : ["bash_command_1","bash_command_2"...]
all_outputs = [x for xs in list(DOWNLOAD_OUTPUTS.values()) for x in xs]
(The last line is here to flatten a list of lists into a list)
The idea behind this arrangement is that I can use it within a snakemake rule, in the following example:
rule all:
input:
list(RESULTS.values())
rule randomname:
run:
sample = wildcards.sample
commands = COMMANDS[sample]
for command in commands: os.system(command)
output:
RESULTS[wildcards.sample]
The reason I want to do it this way is because that the commands needed to get a particular sample's output and its output might not share the same structure with the rest of the samples. For example, with one sample I can get 2 outputs, and with some other sample I can get 3 or 4 outputs.
What am I doing wrong?
EDIT 1:
I'm including a full working version of what I want to do here.
In a nutshell, download files from SRA archive, and then subject to an aligner like bwa
or bowtie2
.
The thing that irks me is that I need to specify "raw_samples/{sample}_1.fastq" when I have this information in a Python dictionary.
I'd really really like it if I didn't have to manually tweak that.
SAMPLES, DOWNLOAD_OUTPUTS, DOWNLOAD_COMMANDS = SNAKEMAKE_OUTPUT("linker.csv",0,2)
#MAIN RULE
rule all:
input:
#expand("raw_samples/{sample}_{i}.fastq",sample=SAMPLES,i = range(1,2)),
expand("processed_files/{sample}.bam",sample=SAMPLES)
rule download_samples:
output:
"raw_samples/{sample}_1.fastq",
"raw_samples/{sample}_2.fastq"
run:
sample_name = wildcards.sample
commands = DOWNLOAD_COMMANDS[sample_name]
for command in commands:
os.system(command)
`rule align_stuff`
"generic_alignment"
Hello, what error message do you get? Plus, since you do not affect any value to wildcard.sample, your snakefile will not run...
As a general remark Snakemake is meant to run pipelines. It therefore expects a list of well defined operations, with outputs and inputs clearly expected, that follow one another. Is there something following your rule randomname? Because from your code, it feels like you are using it to run different operations in parallel. In this case, using the
Parallel
python module would be more appropriate (and easier I think).Yes, I have something following the rule
randomname
- I just did a code snippet here in order to not clutter the message. The error message isI know that this is sort of unorthodox for a snakemake file, but I actually have a list of well defined outputs, inputs, and operations, it's just that they're in the form of Python variables. I'm editing the OP for more clarity.
Thanks, it is a lot clearer :) just using
expand("raw_samples/{sample}_1.fastq",sample=SAMPLES)
as an input to yourrule all
doesn't work?No, the second example is the example that works. I'm just including
{sample}.bam
to show that I'm using snakemake for downstream analysis. What irks me is that I need to "manually" inputoutput
for myrule download_samples
, when I want to just read them from a python variable.