How to write the output in snakefile (snakemake) for bowtie2-build
4
2
Entering edit mode
6.2 years ago
c.clarido ▴ 110

Hello community,

I am using snakemake to make a pipeline. I want to add the bowtie2-build from bowtie2 to my current snakefile as follow:

rule bowtie2Build:
    input: "refgenome/infected_consensus.fasta"
    output: "output/reference"
    shell: "bowtie2-build {input} {output}"

So I should be expecting the following files: reference.1.bt2 reference.2.bt2 reference.3.bt2 reference.4.bt2 reference.rev.1.bt2 reference.rev.2.bt2

But it seems the problem lies in the output. How can I write the output?

bowtie2-build bowtie2 snakemake snakefile • 6.6k views
ADD COMMENT
7
Entering edit mode
6.2 years ago
IP ▴ 770

You wrote this:

rule bowtie2Build:
    input: "refgenome/infected_consensus.fasta"
    output: "output/reference"
    shell: "bowtie2-build {input} {output}"

What snakemake is going to do is to check if the file output ( in this case "output/reference" ) exist after executing the rule. Which doesn't since it is the basename for bowtie

What you can do is to pass the basename for the index as a parameter to snakemake function. Something like the following:

 rule bowtie2Build:
    input: "refgenome/infected_consensus.fasta"
    params:
        basename="output/reference"
   output:
        output1="output/reference.1.bt2",
        output2="output/reference.2.bt2",
        output3="output/reference.3.bt2",
        output4="output/reference.4.bt2",
        outputrev1="output/reference.rev1.bt2",
        outputrev2="output/reference.rev2.bt2"
    shell: "bowtie2-build {input} {params.basename}"
ADD COMMENT
0
Entering edit mode

Hello,

Using the following code:

rule bowtie2Build:
    input: "/home/bnextgen/refgenome/infected_consensus.fasta"
    params:
        basename: "/home/s1104230/output/reference"
    output:
        output1="output/reference.1.bt2"
        output2="output/reference.2.bt2"
        output3="output/reference.3.bt2"
        output4="output/reference.4.bt2"
        outputrev1="output/reference.rev1.bt2"
        outputrev2="output/reference.rev2.bt2"
    shell: "bowtie2-build {input} {params.basename}"

I get :

SyntaxError in line 31 of /home/s1104230/scripts/Snakefile:
Unexpected keyword output1 in rule definition (Snakefile, line 31)
ADD REPLY
1
Entering edit mode

Hello,

after each defined output there must be a ,.

rule bowtie2Build:
    input: "/home/bnextgen/refgenome/infected_consensus.fasta"
    params:
        basename: "/home/s1104230/output/reference"
    output:
        output1="output/reference.1.bt2",
        output2="output/reference.2.bt2",
        output3="output/reference.3.bt2",
        output4="output/reference.4.bt2",
        outputrev1="output/reference.rev1.bt2",
        outputrev2="output/reference.rev2.bt2"
    shell: "bowtie2-build {input} {params.basename}"

fin swimmer

ADD REPLY
0
Entering edit mode

I've updated this in the original (fixed the colon after params too). Good catch.

ADD REPLY
0
Entering edit mode

Hello So I tried again the suggested code but it seems that the error occurs at the rev. files:

MissingOutputException in line 26 of /home/s1104230/scripts/Snakefile:
Missing files after 5 seconds:
/home/s1104230/mapping2/reference.rev2.bt2
/home/s1104230/mapping2/reference.rev1.bt2
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
ADD REPLY
0
Entering edit mode

See if those files show up eventually. We usually use --latency-wait 300 do get around this.

ADD REPLY
0
Entering edit mode

This error is due to a simple typo. There should be a dot after "rev":

outputrev1="output/reference.rev.1.bt2"
ADD REPLY
0
Entering edit mode

thanks for the corrections :)

ADD REPLY
1
Entering edit mode
6.2 years ago
Eric Lim ★ 2.2k

IP's solution is best as expected outputs are listed explicitly which would take advantage of timestamp.

Since v5.2.0, you can also accomplish this by using directory, which shorten the code by quite a bit while maintaining timestamp, but snakemake won't check if all expected outputs exist.

rule bowtie_build:
  input: 'chrM.fa'
  output: directory('reference/bowtie/')
  shell: 'bowtie2-build {input} {output}'

In this particular example, you'd need to ls -a to see the outputs.

https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#directories-as-outputs

directory has been super helpful in eliminating a bunch of if-else on rules when we don't know what outputs to expect at code time.

ADD COMMENT
0
Entering edit mode
6.2 years ago

IP's answer is what I would do. An alternative is to use the touch function to create a marker file that signals that a task has successfully completed:

rule bowtie2Build:
    input: "/home/bnextgen/refgenome/infected_consensus.fasta"
    params:
        basename: "/home/s1104230/output/reference"
    output:
        touch('index.done'),
    shell: "bowtie2-build {input} {params.basename}"

Then, rules that depend on the bowtie index will take in input the file index.done.

ADD COMMENT
0
Entering edit mode

Hello, I tried your suggestion But I get the following error:

Error in rule bowtie2Build: jobid: 0 output: index.done

RuleException:
CalledProcessError in line 49 of /home/s1104230/scripts/Snakefile:
Command ' set -euo pipefail;  bowtie2-build  /home/s1104230/mapping2/reference ' returned non-zero exit status 1
  File "/home/s1104230/scripts/Snakefile", line 49, in __rule_bowtie2Build
  File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run
ADD REPLY
0
Entering edit mode

The shell command is executed but it is throwing an error. In fact, bowtie2-build should have two positional arguments but there's only one:

bowtie2-build  /home/s1104230/mapping2/reference

It should be something like:

bowtie2-build /home/bnextgen/refgenome/infected_consensus.fasta /home/s1104230/mapping2/reference
ADD REPLY
0
Entering edit mode
6.0 years ago
zorrilla • 0

sort of off topic, but shouldt you also expect to get a .fai file as the output of the bowtie2 build? Can someone clarify this for me?

ADD COMMENT
0
Entering edit mode

-A fai file is a index on the fasta file with the format specified here -Bowtie2 output is the Burrows-Wheeler transformation of the reference genome. You can read about the BW transform here.

They are two different indexing techniques, so bowtie2-build does create a fai file because it does not need it for the alignment. It is just searching the BWT

ADD REPLY

Login before adding your answer.

Traffic: 4613 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6