Question

Problem with snakefile.

0

Entering edit mode

3.9 years ago

Jimpix ▴ 10

Hello! This is fragment of my snakefile:

SAMPLES=["sample_R1", "sample_R2"]

rule all:
input: 
    expand("path/to_file/{sample}.tsv"", sample=SAMPLES)

rule one:
input:
    R1 = "path/to_file/{sample}.bam", 
    R2 = "path/to_file/{sample}.bam"
output:
    "path/to_file/{sample}.tsv"
shell:
    "path/to_file/src.py {input.R1} {input.R2}"

snakemake executing rule one like this:

   path/to_file/sample_R1.bam, path/to_file/sample_R1.bam
   path/to_file/sample_R2.bam, path/to_file/sample_R2.bam

What I need is:

path/to_file/sample_R1.bam, path/to_file/sample_R2.bam

and store the output in one .tsv file I already try some ways but I have fallen. Can someone give some advises? Thanks in advance.

snakemake snakefile • 2.0k views

ADD COMMENT • link updated 3.9 years ago by cpad0112 21k • written 3.9 years ago by Jimpix ▴ 10

1

Entering edit mode

your outputs (thus rule all input) are problem. If you want to have your final tsv to have sample name, you need to split between sample name and reads. You would understand the issue, if you run following code:

Wherever , bam files (sample_R1.bam, sample_R2.bam) are located, run this code:

SAMPLES = ["sample"]

rule all:
    input:
        expand("{sample}.tsv", sample=SAMPLES)

rule fastqc:
    input:
        R1= "{sample}_R1.bam",
        R2= "{sample}_R2.bam"
    output:
         "{sample}.tsv"
    shell: """
        "path/to_file/src.py {input.R1} {input.R2}" 
    """

To get the sample name in your output, you either modify output name (with in rule) or do wild cards separately for samples and reads. Run this code with snakemake -nps file.smk

If you prefer OP way, try running following smk with snakemake -nps file.smk:

SAMPLES=["sample_R1", "sample_R2"]
RES = list(set([i.rsplit('_')[0] for i in SAMPLES]))

rule all:
    input:
        expand("{res}.tsv", res=RES)

rule one:
    input:
        expand("{sample}.bam",sample=SAMPLES)

    output:
        "{res}.tsv"
    shell:
        "path/to_file/src.py {input}"

ADD REPLY • link 3.9 years ago by cpad0112 21k

0

Entering edit mode

I updated my code like your second advice, and a I got an error:

WildcardError in line 6 of /path/Snakefile:
No values given for wildcard 'res'

I have rules before rule one so I need to do rather this way. Please help.

ADD REPLY • link 3.9 years ago by Jimpix ▴ 10

1

Entering edit mode

Please post the code. without code, it is difficult to trouble shoot. If you have used expand function before '{res}' in your code some where , it needs to be expanded.

ADD REPLY • link 3.9 years ago by cpad0112 21k

score 1 · Answer 1 · 2021-05-20

1

Entering edit mode

3.9 years ago

camerond ▴ 190

You need to bring the _R1 and _R2 out of the sample IDs and hard code it into the paths. Something like this should do it ... (not tested).

SAMPLES=["sample"]

rule all:
    input: 
        expand("path/to_file/{sample}.tsv", sample=SAMPLES)

rule one:
    input:
        R1 = "path/to_file/{sample}_R1.bam", 
        R2 = "path/to_file/{sample}_R2.bam"
    output:
        "path/to_file/{sample}.tsv"
    shell:
        "path/to_file/src.py {input.R1} {input.R2}"

There is also a double " after .tsv in your expand statement but this is probs a typo.

ADD COMMENT • link 3.9 years ago by camerond ▴ 190

1

Entering edit mode

In OP's faulty code, executing rule all would execute rule one. In your code, that won't happen as one's output is not linked to all's input.

ADD REPLY • link 3.9 years ago by Ram 45k

0

Entering edit mode

Ah yes, wrote that in a bit of a rush last night just before I finished. Updated now and tested.

ADD REPLY • link 3.9 years ago by camerond ▴ 190

1

Entering edit mode

did you do a dry-run on this code? @ camerond

ADD REPLY • link 3.9 years ago by cpad0112 21k

0

Entering edit mode

Updated the code and dry run was successful, many thanks.

ADD REPLY • link 3.9 years ago by camerond ▴ 190