Snakemake wildcards in the path input/output
1
0
Entering edit mode
3.6 years ago
wanaga3166 ▴ 10

Hi everyone,

I wrote a snakefile to check the quality of several fastq files and put the report in a results directory. I thought to use wildcards as explained in the snakemake readthedocs. But it's return an invalid syntax at the first line in the code below. Where I did a mistake? Maybe I should use the aggregation function.

TUMORS, SAMPLES, = glob_wildcards(../Data/{tumor}/{sample}.fastq.gz)

rule all:
    input:
        ["../Data/{tumor}/{sample}.fastq.gz".format(tumor=tumor) for tumor in TUMORS]

rule fastqc_before_trim:
    input:
        "../Data/{tumor}/{sample}.fastq.gz"
    output:
        "../Results/QC/Before_trimming/{tumor}/{sample}_fastqc.html"
    threads:
        4
    shell:
        "fastqc -t {threads} {input} -o {output}"

My working directory is organized as below.

------ Script 
|           |
|           |------Script_01.smk
|           |------Script_02.smk
|
|------Data
|           |
|           |------Tumor_01
|           |           |
|           |           |------Lane01.fastq.gz
|           |           |------Lane02.fastq.gz
|           |------Tumor_02
|                       |
|                       |------Lane01.fastq.gz
|                       |------Lane02.fastq.gz
|
|------Results
|           |
|           |--------QC
|           |        |
|           |        |------Before_Trimming 
|           |        |             |
|           |        |             |------Tumor_01
|           |        |             |             |
|           |        |             |             |------Tumor_01_Lane01_fastqc.html
|           |        |             |             |------Tumor_01_Lane02_fastqc.html
|           |        |             |
|           |        |             |------Tumor_02
|           |        |                           |
|           |        |                           |------Tumor_02_Lane01_fastqc.html
|           |        |                           |------Tumor_02_Lane02_fastqc.html
|           |        |                                        
|           |        |------After_Trimming 
|           |                      |
|           |                      |------Tumor_01
|           |                      |             |
|           |                      |             |------Tumor_01_Lane01_cleaned_fastqc.html
|           |                      |             |------Tumor_01_Lane02_cleaned_fastqc.html
|           |                      |
|           |                      |------Tumor_02
|           |                                    |
|           |                                    |------Tumor_02_Lane01_cleaned_fastqc.html
|           |                                    |------Tumor_02_Lane02_cleaned_fastqc.html
|           |                                                             
|           |------Mapping
|
|

Thank you for your help.

Snakemake • 2.2k views
ADD COMMENT
0
Entering edit mode

Rule all input is not matching with output from fastqc if rule fastqc_before_trim: is only rule. If not, please post entire snakemake file. Even if it is correct, you have not expanded samples in rule all input.

ADD REPLY
0
Entering edit mode

Below you will find the entire snakemake file. When I execute my snakefile, I have the same message invalid syntax on this line TUMORS, SAMPLES, = glob_wildcards(../Data/{tumor}/{sample}.fastq.gz).

TUMORS, SAMPLES, = glob_wildcards(../Data/{tumor}/{sample}.fastq.gz)

rule all:
    input:
        expand("../Results/QC/Before_Trimming/{tumor}/{sample}_fastqc.html", tumor = TUMORS, sample = SAMPLES)
        expand("../Results/QC/After_Trimming/{tumor}/{sample}_cleaned_fastqc.html", tumor = TUMORS, sample = SAMPLES)

rule fastqc_before_trim:
    input:
        "../Data/{tumor}/{sample}.fastq.gz"
    output:
        "../Results/QC/Before_Trimming/{tumor}/{sample}_fastqc.html"
    threads:
        4
    shell:
        "fastqc -t {threads} {input} -o {output}"

rule trim:
    input:
        "../Data/{tumor}/{sample}.fastq.gz"
    params:
        "../Data/{tumor}/"
    output:
        "../Data/{tumor}/{sample}_cleaned.fastq"
    conda:
        "trim.yaml"
    shell:
        "cutadapt -a AAGCAGTGGTATCAACGCAGAGTACATGGGGTCAGATGTGTATAAGAGAC -o {output} {input}"

rule fastqc_after_trim:
    input:
        "../Data/{tumor}/{sample}_cleaned.fastq"
    output:
        "../Results/QC/After_Trimming/{tumor}/{sample}_cleaned_fastqc.html"
    threads:
        4
    shell:
        "fastqc -t {threads} {input} -o {output}"
ADD REPLY
1
Entering edit mode
3.6 years ago

The way Snakemake works is that you write a series of recipes and then you ask it to cook you dinner. Your explicit rule all needs to list files that you want produced by your implicit rule fastqc_before_trim, not the files it needs as input.

The syntax error is likely just a comma issue.

ADD COMMENT
0
Entering edit mode

Thank Jeremy.

I modified my snakefile. However, I have the same problem (syntax error) with this line: (TUMORS, SAMPLES) = glob_wildcards("../Data/{tumor}/{sample}.fastq.gz). I tested : TUMORS, SAMPLES = glob_wildcards(../Data/{tumor}/{sample}.fastq.gz") and it doesn't work too.

(TUMORS, SAMPLES) = glob_wildcards(../Data/{tumor}/{sample}.fastq.gz)

rule all:
    input:
        expand("../Results/QC/Before_Trimming/{tumor}/{sample}_fastqc.html", tumor = TUMORS, sample = SAMPLES),
        expand("../Results/QC/After_Trimming/{tumor}/{sample}_cleaned_fastqc.html", tumor = TUMORS, sample = SAMPLES)
ADD REPLY
1
Entering edit mode

(TUMORS, SAMPLES) = glob_wildcards("../Data/{tumor}/{sample}.fastq.gz")

ADD REPLY
0
Entering edit mode

Thank Jeremy. I corrected the file. Now I have an another problem:

Snakemake return this error message :

Missing input files for rule fastqc_before_trim:
../Data/Gon_M1/PB4_S13_L001_R2_001.fastq.gz

I was surprisingly by this message because the fastq.gz file (PB4_S13_L001_R2_001.fastq.gz) is not stored in Gon_M1 folder but in Gon_M3 folder. How to avoid this problem ?

ADD REPLY
0
Entering edit mode

Expand is producing every possible combination of TUMOR/SAMPLE. Maybe you should be building your list of desired files in a more controlled fashion. I don't use glob_wildcards because I don't like to have my filesystem determine what gets analyzed.

ADD REPLY

Login before adding your answer.

Traffic: 1693 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6