Snakemake: MissingInputException
0
0
Entering edit mode
3.0 years ago

Hello,

I am trying to create a simple Snakemake workflow and I am having some issues. My file looks like this:

--------------------
ARCHIVE_FILE = 'output.tar.gz'

**a single output file**

OUTPUT_FILE = 'output/{species}.out'

**a single input file**

INPUT_FILE = 'proteins/{species}.fasta'

**Build the list of input files.**

INP = glob_wildcards(INPUT_FILE).species

**The list of all output files**

OUT = expand(OUTPUT_FILE, species=INP)

**pseudo-rule that tries to build everything.
just add all the final outputs that you want built.**

rule all:

input: ARCHIVE_FILE

**hmmsearch**

rule hmm:

input:

cmd='hmmsearch --tblout output_tblout_egf --noali -E 99',

species=INPUT_FILE ,

hmm='hmm/EGF.hmm'

output: OUTPUT_FILE

shell: '{input.cmd} {input.hmm} {input.species} {output}'

**create an archive with all results**

rule create_archive:

input: OUT

output: ARCHIVE_FILE

shell: 'tar -czvf {output} {input}' 

This file produces the two following errors:

---------------------

**MissingInputException in line 29 of /home/agalvez/data/workflow-workshop/test/Snakefile:
Missing input files for rule hmm:
hmmsearch --tblout output_tblout_egf --noali -E 99**
---------------------

**MissingInputException in line 49 of /home/agalvez/data/workflow-workshop/test/Snakefile:
Missing input files for rule create_archive:
output/EP00771_Trimastix_marina.out
output/EP00759_Prokinetoplastina_sp_PhF-6.out**

---------------------

It is the first time I ever try to use Snakemake or anything related to Python so I do not understand why this is failing. Any help would be really appreciated. Thanks in advance!

Snakefile Snakemake Python Input • 2.0k views
ADD COMMENT
2
Entering edit mode

input is a file that should exist, not a command...

Could you try to fix the formatting of the post?

ADD REPLY
0
Entering edit mode

Your code should be formatted like this:

ARCHIVE_FILE = 'output.tar.gz'
OUTPUT_FILE = 'output/{species}.out'
INPUT_FILE = 'proteins/{species}.fasta'
INP = glob_wildcards(INPUT_FILE).species
OUT = expand(OUTPUT_FILE, species=INP)

rule all:
    input: ARCHIVE_FILE

rule hmm:
    input:
        species=INPUT_FILE,
        hmm='hmm/EGF.hmm',
        cmd='hmmsearch --tblout output_tblout_egf --noali -E 99',
        species=INPUT_FILE ,
        hmm='hmm/EGF.hmm'

    output: OUTPUT_FILE
    shell: '{input.cmd} {input.hmm} {input.species} {output}'


rule create_archive:
    input: OUT
    output: ARCHIVE_FILE
    shell: 'tar -czvf {output} {input}' 
ADD REPLY
0
Entering edit mode

The formatted file looks like this:

ARCHIVE_FILE = 'output.tar.gz'

# a single output file
OUTPUT_FILE = 'output/{species}.out'

# a single input file
INPUT_FILE = 'proteins/{species}.fasta'

# Build the list of input files.
INP = glob_wildcards(INPUT_FILE).species

# The list of all output files
OUT = expand(OUTPUT_FILE, species=INP)

# pseudo-rule that tries to build everything.
# Just add all the final outputs that you want built.
rule all:
    input: ARCHIVE_FILE

# hmmsearch
rule hmm:
    input:
        cmd='hmmsearch --tblout output_tblout_egf --noali -E 99',
        species=INPUT_FILE ,
        hmm='hmm/EGF.hmm'
    output: OUTPUT_FILE
    shell: '{input.cmd} {input.hmm} {input.species} {output}'

# create an archive with all results
rule create_archive:
    input: OUT
    output: ARCHIVE_FILE
    shell: 'tar -czvf {output} {input}'
ADD REPLY
1
Entering edit mode

I think what @WouterDeCoster was trying to steer you toward doing is delete the line cmd='hmmsearch --tblout output_tblout_egf --noali -E 99', and then write the shell command for rule hmm as:

shell: 'hmmsearch --tblout output_tblout_egf --noali -E 99 {input.hmm} {input.species} {output}'

Input is for telling Snakemake files that should exist. That's a shell command & so you just write it out as part of the shell command line of the rule.

I use input.cmd in this case because the script is a file I need for the rule to work. This way snakemake is making sure that script file that python runs is available where the other input files are located, and most importantly, reruns the rule if that script is changed. (See about wordcount.py under 'Handling dependencies differently' here.)

Your example is more like how fastqc and trimmomatic are used here. Or how python is used in that case I linked to. Or how you use tar in your archiving rule. Those are software installed into the system path (or environment, in some cases) that run with calls. Those wouldn't be expected to be as subjected to editing as a Python script kept with your data may be.

ADD REPLY

Login before adding your answer.

Traffic: 1384 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6