Missing Input files Error with snakemake pipeline
1
0
Entering edit mode
3.1 years ago
skbrimer ▴ 740

Hello hive mind,

I am having issues with a snakemake script.

What I want it to do is use abricate to look for AMR and resistance genes for each of my samples and then make a summary file for AMR, virulence factors, and plasmids.

What is happening is the script sees that I am missing the needed input files for the summary call and then stops, instead of making them.

EDIT - Here is the error message as well

sean@LEN943:~/Desktop/salmonella/LS21-4590_Sal$ snakemake -s abricate_AMR_VF_snakefile -j1
Building DAG of jobs...
MissingInputException in line 106 of /home/sean/Desktop/salmonella/LS21-4590_Sal/abricate_AMR_VF_snakefile:
Missing input files for rule SummaryAMR:
abricate/LS21-4590-1-Salmonella/LS21-4590-1-Salmonella_ncbi.tab
abricate/LS21-4590-1-Salmonella/LS21-4590-1-Salmonella_argannot.tab
abricate/LS21-4590-1-Salmonella/LS21-4590-1-Salmonella_resfinder.tab
abricate/LS21-4590-1-Salmonella/LS21-4590-1-Salmonella_bacmet2.tab
abricate/LS21-4590-1-Salmonella/LS21-4590-1-Salmonella_megares.tab
abricate/LS21-4590-1-Salmonella/LS21-4590-1-Salmonella_card.tab

I have the script below, however I do not understand why snakemake is not making the needed files.I know this is a user error but I can not see where I have gone astray. Any help is greatly appreciated.

configfile: "config.yaml"

rule all:
    input:
        expand("SummaryAMR_{sample}.tab", sample = config["names"]),
        expand("SummaryVF_{sample}.tab", sample = config["names"]),
        expand("plasmidfinder_{sample}.tab", sample = config["names"])

# Finding AMR Genes

rule bacmet2_db:
    input:
        "{sample}_de_novo/contigs.fasta"
    params:
        db="bacmet2"
    output:
        directory("abricate/{sample}/AMR_{sample}_{params.db}.tab")
    shell:
        "abricate --db {params.db} {input} > {output}"

rule card_db:
    input:
        "{sample}_de_novo/contigs.fasta"
    params:
        db="card"
    output:
        directory("abricate/{sample}/AMR_{sample}_{params.db}.tab")
    shell:
        "abricate --db {params.db} {input} > {output}"

rule megares_db:
    input:
        "{sample}_de_novo/contigs.fasta"
    params:
        db="megares"
    output:
        directory("abricate/{sample}/AMR_{sample}_{params.db}.tab")
    shell:
        "abricate --db {params.db} {input} > {output}"

rule ncbi_AMRFinderPlus:
    input:
        "{sample}_de_novo/contigs.fasta"
    params:
        db="ncbi"
    output:
        directory("abricate/{sample}/AMR_{sample}_{params.db}.tab")
    shell:
        "abricate --db {params.db} {input} > {output}"

rule resfinder_db:
    input:
        "{sample}_de_novo/contigs.fasta"
    params:
        db="resfinder"
    output:
        directory("abricate/{sample}/AMR_{sample}_{params.db}.tab")
    shell:
        "abricate --db {params.db} {input} > {output}"

rule argannot:
    input:
        "{sample}_de_novo/contigs.fasta"
    params:
        db="argannot"
    output:
        directory("abricate/{sample}/AMR_{sample}_{params.db}.tab")
    shell:
        "abricate --db {params.db} {input} > {output}"


# Finding virulence factors

rule vfdb:
    input:
        "{sample}_de_novo/contigs.fasta"
    params:
        db="vfdb"
    output:
        directory("abricate/{sample}/{sample}_{params.db}.tab")
    shell:
        "abricate --db {params.db} {input} > {output}"

rule victors:
    input:
        "{sample}_de_novo/contigs.fasta"
    params:
        db="victors"
    output:
        directory("abricate/{sample}/{sample}_{params.db}.tab")
    shell:
        "abricate --db {params.db} {input} > {output}"

# Finding Plasmids

rule plasmidfinder:
    input:
        "{sample}_de_novo/contigs.fasta"
    params:
        db="plasmidfinder"
    output:
        "{params.db}_{sample}.tab"
    shell:
        "abricate --db {params.db} {input} > {output}"

rule SummaryAMR:
    input:
        argannot="abricate/{sample}/{sample}_argannot.tab",
        bacmet2="abricate/{sample}/{sample}_bacmet2.tab",
        card="abricate/{sample}/{sample}_card.tab",
        megares="abricate/{sample}/{sample}_megares.tab",
        ncbi="abricate/{sample}/{sample}_ncbi.tab",
        resfinder="abricate/{sample}/{sample}_resfinder.tab",
    output:
        "SummaryAMR_{sample}.tab"
    shell:
        "abricate summary {input.argannot} {input.bacmet2} {input.card} {input.megares} \
        {input.ncbi} {input.resfinder} > {output}"

rule SummaryVF:
    input:
        vfdb="abricate/{sample}/{sample}_vfdb.tab",
        victors="abricate/{sample}/{sample}_victors.tab"
    output:
        "SummaryVF_{sample}.tab"
    shell:
        "abricate summary {input.vfdb} {input.victors} > {output}"
abricate snakemake • 2.4k views
ADD COMMENT
0
Entering edit mode

please post the error.

ADD REPLY
0
Entering edit mode

Sorry about that, please find the error message in the main post above. For as helpful as it is.

ADD REPLY
0
Entering edit mode

try running snakemake in dryrun mode and see if it throws any error. Also check if previous steps generate appropriate output. Try setting priorities if there is no syntax error. In summary AMR (resfinder="abricate/{sample}/{sample}_resfinder.tab",), i see extra comma at the end, see removing it works.

ADD REPLY
0
Entering edit mode

I ran the pipeline in dryrun (-n) both before and after deleting that extra comma and received the same error without any extra information. Also it is not making the new directories when its ran in live mode.

I will go and read the docs to find out how to set the priorities, I didn't know that was a thing.

My understanding of snakemake was you targeted the final outputs you wanted and then it worked backwards to get there so I'm not sure why it sees the missing files and just stops.

FYI Error from Dry runs

sean@LEN943:~/Desktop/salmonella/LS21-4590_Sal$ snakemake -s abricate_AMR_VF_snakefile -n
Building DAG of jobs...
MissingInputException in line 106 of /home/sean/Desktop/salmonella/LS21-4590_Sal/abricate_AMR_VF_snakefile:
Missing input files for rule SummaryAMR:
abricate/LS21-4590-1-Salmonella/LS21-4590-1-Salmonella_card.tab
abricate/LS21-4590-1-Salmonella/LS21-4590-1-Salmonella_bacmet2.tab
abricate/LS21-4590-1-Salmonella/LS21-4590-1-Salmonella_ncbi.tab
abricate/LS21-4590-1-Salmonella/LS21-4590-1-Salmonella_resfinder.tab
abricate/LS21-4590-1-Salmonella/LS21-4590-1-Salmonella_argannot.tab
abricate/LS21-4590-1-Salmonella/LS21-4590-1-Salmonella_megares.tab
ADD REPLY
2
Entering edit mode
3.1 years ago
Eric Lim ★ 2.2k

Inconsistent use of file paths with and without AMR. I don't think you can use params in output. Also, I don't think directory is going to work the way you expected. I'm not very familiar with the function, so you'd need to read the docs.

(base) [~/tmp/biostars]$ snakemake --dryrun --summary
Building DAG of jobs...
output_file date    rule    version log-file(s) status  plan
SummaryAMR_test.tab -   -   -   -   missing update pending
abricate/test/test_argannot.tab -   -   -   -   missing update pending
abricate/test/test_bacmet2.tab  -   -   -   -   missing update pending
abricate/test/test_card.tab -   -   -   -   missing update pending
abricate/test/test_megares.tab  -   -   -   -   missing update pending
abricate/test/test_ncbi.tab -   -   -   -   missing update pending
abricate/test/test_resfinder.tab    -   -   -   -   missing update pending
SummaryVF_test.tab  -   -   -   -   missing update pending
abricate/test/test_vfdb.tab -   -   -   -   missing update pending
abricate/test/test_victors.tab  -   -   -   -   missing update pending
plasmidfinder_test.tab  -   -   -   -   missing update pending

I commented out the area of code you should fix.

#configfile: "config.yaml"
config['names'] = 'test'
shell('mkdir -p test_de_novo; touch test_de_novo/contigs.fasta')

rule all:
    input:
        expand("SummaryAMR_{sample}.tab", sample = config["names"]),
        expand("SummaryVF_{sample}.tab", sample = config["names"]),
        expand("plasmidfinder_{sample}.tab", sample = config["names"])

# Finding AMR Genes

rule bacmet2_db:
    input:
        "{sample}_de_novo/contigs.fasta"
    params:
        db="bacmet2"
    #output:
    #    directory("abricate/{sample}/AMR_{sample}_{params.db}.tab")
    output:
        directory('abricate/{sample}/{sample}_bacmet2.tab')
    shell:
        "abricate --db {params.db} {input} > {output}"

rule card_db:
    input:
        "{sample}_de_novo/contigs.fasta"
    params:
        db="card"
    #output:
    #    directory("abricate/{sample}/AMR_{sample}_{params.db}.tab")
    output:
        directory('abricate/{sample}/{sample}_card.tab')
    shell:
        "abricate --db {params.db} {input} > {output}"

rule megares_db:
    input:
        "{sample}_de_novo/contigs.fasta"
    params:
        db="megares"
    #output:
    #    directory("abricate/{sample}/AMR_{sample}_{params.db}.tab")
    output:
        directory('abricate/{sample}/{sample}_megares.tab')
    shell:
        "abricate --db {params.db} {input} > {output}"

rule ncbi_AMRFinderPlus:
    input:
        "{sample}_de_novo/contigs.fasta"
    params:
        db="ncbi"
    #output:
    #    directory("abricate/{sample}/AMR_{sample}_{params.db}.tab")
    output:
        directory('abricate/{sample}/{sample}_ncbi.tab')
    shell:
        "abricate --db {params.db} {input} > {output}"

rule resfinder_db:
    input:
        "{sample}_de_novo/contigs.fasta"
    params:
        db="resfinder"
    #output:
    #    directory("abricate/{sample}/AMR_{sample}_{params.db}.tab")
    output:
        directory('abricate/{sample}/{sample}_resfinder.tab')
    shell:
        "abricate --db {params.db} {input} > {output}"

rule argannot:
    input:
        "{sample}_de_novo/contigs.fasta"
    params:
        db="argannot"
    #output:
    #    directory("abricate/{sample}/AMR_{sample}_{params.db}.tab")
    output:
        directory('abricate/{sample}/{sample}_argannot.tab')
    shell:
        "abricate --db {params.db} {input} > {output}"


# Finding virulence factors

rule vfdb:
    input:
        "{sample}_de_novo/contigs.fasta"
    params:
        db="vfdb"
    #output:
    #    directory("abricate/{sample}/{sample}_{params.db}.tab")
    output:
        directory('abricate/{sample}/{sample}_vfdb.tab')
    shell:
        "abricate --db {params.db} {input} > {output}"

rule victors:
    input:
        "{sample}_de_novo/contigs.fasta"
    params:
        db="victors"
    #output:
    #    directory("abricate/{sample}/{sample}_{params.db}.tab")
    output:
        directory('abricate/{sample}/{sample}_victors.tab')
    shell:
        "abricate --db {params.db} {input} > {output}"

# Finding Plasmids

rule plasmidfinder:
    input:
        "{sample}_de_novo/contigs.fasta"
    params:
        db="plasmidfinder"
    #output:
    #    "{params.db}_{sample}.tab"
    output:
        'plasmidfinder_{sample}.tab'
    shell:
        "abricate --db {params.db} {input} > {output}"

rule SummaryAMR:
    input:
        argannot="abricate/{sample}/{sample}_argannot.tab",
        bacmet2="abricate/{sample}/{sample}_bacmet2.tab",
        card="abricate/{sample}/{sample}_card.tab",
        megares="abricate/{sample}/{sample}_megares.tab",
        ncbi="abricate/{sample}/{sample}_ncbi.tab",
        resfinder="abricate/{sample}/{sample}_resfinder.tab",
    output:
        "SummaryAMR_{sample}.tab"
    shell:
        "abricate summary {input.argannot} {input.bacmet2} {input.card} {input.megares} \
        {input.ncbi} {input.resfinder} > {output}"

rule SummaryVF:
    input:
        vfdb="abricate/{sample}/{sample}_vfdb.tab",
        victors="abricate/{sample}/{sample}_victors.tab"
    output:
        "SummaryVF_{sample}.tab"
    shell:
        "abricate summary {input.vfdb} {input.victors} > {output}"
ADD COMMENT
0
Entering edit mode

I tried removing the params.db from the output file and removing the directory output setting as well, however I received the same missing input error.

ADD REPLY
1
Entering edit mode

Did you also remove AMR?

ADD REPLY
1
Entering edit mode

Thank you! It was Inconsistent use of file paths with and without AMR and I didn't see it until you asked the second time.

ADD REPLY
1
Entering edit mode

Cool! Yeah, as scripts get longer and with the use of wildcards, it can take a frustratingly long time to spot these things. Glad a pair of fresh eyes is helpful!

ADD REPLY

Login before adding your answer.

Traffic: 2139 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6