Problem with snakefile.
1
0
Entering edit mode
3.5 years ago
Jimpix ▴ 10

Hello! This is fragment of my snakefile:

SAMPLES=["sample_R1", "sample_R2"]

rule all:
input: 
    expand("path/to_file/{sample}.tsv"", sample=SAMPLES)

rule one:
input:
    R1 = "path/to_file/{sample}.bam", 
    R2 = "path/to_file/{sample}.bam"
output:
    "path/to_file/{sample}.tsv"
shell:
    "path/to_file/src.py {input.R1} {input.R2}"

snakemake executing rule one like this:

   path/to_file/sample_R1.bam, path/to_file/sample_R1.bam
   path/to_file/sample_R2.bam, path/to_file/sample_R2.bam

What I need is:

path/to_file/sample_R1.bam, path/to_file/sample_R2.bam

and store the output in one .tsv file I already try some ways but I have fallen. Can someone give some advises? Thanks in advance.

snakemake snakefile • 1.8k views
ADD COMMENT
1
Entering edit mode

your outputs (thus rule all input) are problem. If you want to have your final tsv to have sample name, you need to split between sample name and reads. You would understand the issue, if you run following code:

Wherever , bam files (sample_R1.bam, sample_R2.bam) are located, run this code:

SAMPLES = ["sample"]

rule all:
    input:
        expand("{sample}.tsv", sample=SAMPLES)

rule fastqc:
    input:
        R1= "{sample}_R1.bam",
        R2= "{sample}_R2.bam"
    output:
         "{sample}.tsv"
    shell: """
        "path/to_file/src.py {input.R1} {input.R2}" 
    """

To get the sample name in your output, you either modify output name (with in rule) or do wild cards separately for samples and reads. Run this code with snakemake -nps file.smk

If you prefer OP way, try running following smk with snakemake -nps file.smk:

SAMPLES=["sample_R1", "sample_R2"]
RES = list(set([i.rsplit('_')[0] for i in SAMPLES]))

rule all:
    input:
        expand("{res}.tsv", res=RES)

rule one:
    input:
        expand("{sample}.bam",sample=SAMPLES)

    output:
        "{res}.tsv"
    shell:
        "path/to_file/src.py {input}"
ADD REPLY
0
Entering edit mode

I updated my code like your second advice, and a I got an error:

WildcardError in line 6 of /path/Snakefile:
No values given for wildcard 'res'

I have rules before rule one so I need to do rather this way. Please help.

ADD REPLY
1
Entering edit mode

Please post the code. without code, it is difficult to trouble shoot. If you have used expand function before '{res}' in your code some where , it needs to be expanded.

ADD REPLY
1
Entering edit mode
3.5 years ago
camerond ▴ 190

You need to bring the _R1 and _R2 out of the sample IDs and hard code it into the paths. Something like this should do it ... (not tested).

SAMPLES=["sample"]

rule all:
    input: 
        expand("path/to_file/{sample}.tsv", sample=SAMPLES)

rule one:
    input:
        R1 = "path/to_file/{sample}_R1.bam", 
        R2 = "path/to_file/{sample}_R2.bam"
    output:
        "path/to_file/{sample}.tsv"
    shell:
        "path/to_file/src.py {input.R1} {input.R2}"

There is also a double " after .tsv in your expand statement but this is probs a typo.

ADD COMMENT
1
Entering edit mode

In OP's faulty code, executing rule all would execute rule one. In your code, that won't happen as one's output is not linked to all's input.

ADD REPLY
0
Entering edit mode

Ah yes, wrote that in a bit of a rush last night just before I finished. Updated now and tested.

ADD REPLY
1
Entering edit mode

did you do a dry-run on this code? @ camerond

ADD REPLY
0
Entering edit mode

Updated the code and dry run was successful, many thanks.

ADD REPLY

Login before adding your answer.

Traffic: 2323 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6