Snakemake error
1
1
Entering edit mode
2.7 years ago
kamanovae ▴ 100

Hi! I have a snakemake pipeline with an error that I can't find. I successfully run all the rules in turn, but if I run the pipeline in one click, then the sort rules for some reason are launched repeatedly by strange output folders

Job stats:
job                       count    min threads    max threads
----------------------  -------  -------------  -------------
all                           1              1              1
bedtools_genomecov            1              1              1
bwamem                        2              1              1
gatk_ApplyBQSR               64              1              1
gatk_BaseRecalibrator        64              1              1
gatk_GatherBQSRReports        1              1              1
repair                        2              1              1
samtools_index                1              1              1
sort                          1              1              1
total                       137              1              1

Select jobs to execute...


rule sort:
    output: out///SRR2962669/SRR2962669//SRR2962669/SRR2962669_R1.fastq, out///SRR2962669/SRR2962669//SRR2962669/SRR2962669_R2.fastq
    jobid: 8
    wildcards: sample=/SRR2962669/SRR2962669
    resources: tmpdir=/tmp

This is error:

Error in rule sort:
    jobid: 8
    output: out///SRR2962669/SRR2962669//SRR2962669/SRR2962669_R1.fastq, out///SRR2962669/SRR2962669//SRR2962669/SRR2962669_R2.fastq
    shell:
         cat /storage1/GatkBwaTest/exom//SRR2962669/SRR2962669*1*.fastq | paste - - - - |  sort -k1,1 -S 30G | tr ' ' '
' > out///SRR2962669/SRR2962669//SRR2962669/SRR2962669_R1.fastq ; cat /storage1/GatkBwaTest/exom//SRR2962669/SRR2962669*2*.fastq | paste - - - - |  sort -k1,1 -S 30G | tr '    ' '
' > out///SRR2962669/SRR2962669//SRR2962669/SRR2962669_R2.fastq
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

But if you run one rule sort, then there are no problems

Pipeline looks like this:

import os

(SAMPLES,) = glob_wildcards("/storage1/GatkBwaTest/exom/SRR2962669/{sample}_1.fastq")
INTERVALS = glob_wildcards("/storage1/GatkBwaTest/SnakemakeInput/intervals/{interval}.bed")

SAMPLE_PATH=os.path.dirname(config["sample"])
print(SAMPLE_PATH)
gatk="/storage1/GatkBwaTest/bin/gatk-4.1.3.0/gatk"
bwa_mem="/storage1/GatkBwaTest/bin/bwa-mem2-2.2.1_x64-linux/bwa-mem2"
repair="/storage1/GatkBwaTest/bin/repair.sh"
bcftools="/storage1/GatkBwaTest/bin/bcftools-1.14/bcftools"
filterbyname= "/storage1/GatkBwaTest/bin/bbmap/filterbyname.sh"
samtools="/storage1/GatkBwaTest/bin/samtools-1.11/samtools"



rule all:
        input:
                expand(config["out"] + "/{sample}/{sample}_R1.fastq", sample=SAMPLES),
                expand(config["out"] + "/{sample}/{sample}_R2.fastq", sample=SAMPLES),
        expand(config["out"] + "/{sample}/{sample}_fastrepair_R1.fastq", sample=SAMPLES),
        expand(config["out"] + "/{sample}/{sample}_fastrepair_R2.fastq",  sample=SAMPLES),
        expand(config["out"] + "/{sample}/{sample}_fastrepair_outs.fastq", sample=SAMPLES),
        expand(config["out"] + "/{sample}/{sample}.sorted.mkdup.bam", sample=SAMPLES),
        expand(config["out"] + "/{sample}/{sample}.sorted.mkdup.bam.bai", sample=SAMPLES),
        expand(config["out"] + "/{sample}/{sample}.bedtools_genomecov.genome.covered.bed", sample=SAMPLES),
        expand(config["out_BaseRecalibrator"] + "/{sample}/{sample}.BaseRecalibrator.{interval}.txt", sample=SAMPLES, interval=INTERVALS.interval),
        expand(config["out_BaseRecalibrator"] + "/{sample}/{sample}.GatherBQSRReports.txt",  sample=SAMPLES),
        expand(config["out_ApplyBQSR"] + "/{sample}/{sample}.out_ApplyBQSR.{interval}.bam", sample=SAMPLES, interval=INTERVALS.interval),

rule sort:
        output:
                forward_reads=config["out"] +"/{sample}/{sample}_R1.fastq", reverse_reads=config["out"] +"/{sample}/{sample}_R2.fastq"
        shell:
                " cat {SAMPLE_PATH}/{wildcards.sample}*1*.fastq | paste - - - - |  sort -k1,1 -S 30G | tr '\t' '\n' > {output.forward_reads} ;"
                " cat {SAMPLE_PATH}/{wildcards.sample}*2*.fastq | paste - - - - |  sort -k1,1 -S 30G | tr '\t' '\n' > {output.reverse_reads}"
rule repair:
    input:
        forward_reads=rules.sort.output.forward_reads, reverse_reads=rules.sort.output.reverse_reads
    output:
        forward_fastrepair=config["out"] + "/{sample}/{sample}_fastrepair_R1.fastq",
        reverse_fastrepair=config["out"] + "/{sample}/{sample}_fastrepair_R2.fastq", 
        outs=config["out"] + "/{sample}/{sample}_fastrepair_outs.fastq"
    log:
        fastrepair=config["out_logs"]+"/{sample}/{sample}.fastrepair.log"
    shell:
        " {repair} -Xmx30g in={input.forward_reads} in2={input.reverse_reads} out={output.forward_fastrepair} out2={output.reverse_fastrepair} "
        " outs={output.outs} sorted=true "      
rule bwamem:
    input:
        ref=config["ref"], forward_reads=rules.repair.output.forward_fastrepair,
        reverse_reads=rules.repair.output.reverse_fastrepair
    output:
        bam=config["out"] + "/{sample}/{sample}.sorted.mkdup.bam"
    params:
        bwa_threads=config["bwamem_threads"], fixmate_threads=config["fixmate_threads"], sort_threads=["sort_threads"],
        per_thread_sort_mem=config["per_thread_sort_mem"], markdup_threads=config["markdup_threads"]
    log:
        bwa=config["out_logs"]+"/{sample}/{sample}.bwa.log", fixmate=config["out_logs"]+"/{sample}/{sample}.fixmate.log",
        sort=config["out_logs"]+"/{sample}/{sample}.sort.log", markdup=config["out_logs"]+"/{sample}/{sample}.markdup.log"
    shell:
        " {bwa_mem} mem -t {params.bwa_threads} {input.ref} {input.forward_reads} {input.reverse_reads} "
        " -R \'@RG\\tID:{wildcards.sample}\\tPU:unit1\\tSM:{wildcards.sample}\\tPL:Illumina\\tLB:lib1\' 2>{log.bwa} | "
        " {samtools} fixmate  -m - -  2>{log.fixmate} | "
        " {samtools} sort -T 'sort_temorary' -@ {params.sort_threads}  2>{log.sort} | "
        " {samtools} markdup -@ {params.markdup_threads} - {output.bam} 2>{log.markdup} "

rule samtools_index:
    input:
        bam=config["out"] + "/{sample}/{sample}.sorted.mkdup.bam"
    output:
        index=config["out"] + "/{sample}/{sample}.sorted.mkdup.bam.bai"         
    log:
        index=config["out_logs"]+"/{sample}/{sample}.samtools_index.log"
    shell:
        " {samtools} index {input.bam} {output.index}"

rule bedtools_genomecov:
    input:
        bam=rules.bwamem.output.bam
    output:
        bed=config["out"]+"{sample}.bedtools_genomecov.genome.covered.bed"
    shell:
        "bedtools genomecov -ibam  {input.bam} -bg > {output.bed}"

rule gatk_BaseRecalibrator:
    input:
        interval=config["intervals"]+"{interval}.bed", bam=rules.bwamem.output.bam,
        ref=config["ref"], dbsnp=config["dbsnp"]
    output:
        txt=config["out_BaseRecalibrator"]+"/{sample}/{sample}.BaseRecalibrator.{interval}.txt"
    params:
        gatk_memory=config["gatk_memory"]
    shell:
        "{gatk} --java-options {params.gatk_memory}  BaseRecalibrator -L {input.interval} -I {input.bam} -O {output.txt} -R {input.ref} "
        " --known-sites {input.dbsnp} "

rule gatk_GatherBQSRReports:
        input:
                txt=expand(config["out_BaseRecalibrator"]+"/{sample}/{sample}.BaseRecalibrator.{interval}.txt", interval=INTERVALS.interval, allow_missing=True)
        params:
                gatk_memory=config["gatk_memory"]
        log:
                gatherBQSRReports=config["out_logs"]+"/{sample}/{sample}.gatherBQSRReports.log"
        output:
                all_txt=config["out_BaseRecalibrator"]+"/{sample}/{sample}.GatherBQSRReports.txt"
        run:
                params_data = '-I ' + ' -I '.join(input.txt)
                cmd = "{gatk} --java-options {params.gatk_memory} GatherBQSRReports %s -O {output.all_txt} 2> {log.gatherBQSRReports}" % params_data
                shell(cmd)

rule gatk_ApplyBQSR:
    input:
        intervals=config["intervals"]+"{interval}.bed", ref=config["ref"],
        txt=rules.gatk_GatherBQSRReports.output.all_txt, bam=rules.bwamem.output.bam
    output:
        ApplyBQSR_bam=config["out_ApplyBQSR"]+"/{sample}/{sample}.out_ApplyBQSR.{interval}.bam"
    params:
        gatk_memory=config["gatk_memory"]
    shell:
        " {gatk} --java-options {params.gatk_memory}  ApplyBQSR -L {input.intervals} -I {input.bam} -O {output.ApplyBQSR_bam} "
        " -R {input.ref} -bqsr {input.txt} --create-output-bam-index false "
snakemake bwa GATK • 1.1k views
ADD COMMENT
1
Entering edit mode

My guess is that this line might the problem: $ cat /storage1/GatkBwaTest/exom//SRR2962669/SRR2962669*1*.fastq esp this: SRR2962669/SRR2962669*1*.fastq. Create a rule to run only this line/ function and check if this is the offending line/function. There is also a way to tell snakemake not to run bash in strict mode.

What is the purpose of this line:

 cat {SAMPLE_PATH}/{wildcards.sample}*1*.fastq | paste - - - - |  sort -k1,1 -S 30G | tr '\t' '\n' > {output.forward_reads} ;
ADD REPLY
0
Entering edit mode

what is the value of {SAMPLE_PATH}?

ADD REPLY
0
Entering edit mode
2.7 years ago
JamesBioInf ▴ 10

Hi :)

Perhaps you could try adding an input to the sort rule to have more control over what is going in (and therefore coming out)?

I'm not totally sure from the example but I feel it may be due to some unintended wildcard expansion or glob_wildcards picking up things you did not mean to. If you try printing SAMPLES it may give some clues. I hope this helps!

Something like:

rule sort:
    input:
            fwd_reads =  os.path.join(SAMPLE_PATH, "{sample}*1*.fastq"),
            rev_reads = os.path.join(SAMPLE_PATH, "{sample}*2*.fastq")
    output:
            forward_reads=config["out"] +"/{sample}/{sample}_R1.fastq", reverse_reads=config["out"] +"/{sample}/{sample}_R2.fastq"
    shell:
            " cat {input.fwd_reads} | paste - - - - |  sort -k1,1 -S 30G | tr '\t' '\n' > {output.forward_reads} ;"
            " cat {input.rev_reads} | paste - - - - |  sort -k1,1 -S 30G | tr '\t' '\n' > {output.reverse_reads}"
ADD COMMENT

Login before adding your answer.

Traffic: 2789 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6