Hello community,
I made a pipeline using snakemake to count SNPs and INDELs. There were no problems when I run the pipeline with smaller data, which is the fragment of the original data. However, when I start running the pipeline using the original data, for some reason a rule has been skipped which caused the next rule not to execute properly. The error is showed as follow:
[Thu Oct 18 22:57:16 2018]
Finished job 3.
2 of 16 steps (12%) done
[Thu Oct 18 22:57:17 2018]
rule invenT:
input: /home/s1104230/output/tr1.fastq, /home/s1104230/output/tr2.fastq
output: /home/s1104230/output/itr1.fastq, /home/s1104230/output/itr2k.fastq
jobid: 8
[Thu Oct 18 23:05:56 2018]
Finished job 8.
3 of 16 steps (19%) done
[Thu Oct 18 23:05:56 2018]
rule bowtie2Aln:
input: /home/s1104230/output/tr1.fastq, /home/s1104230/output/tr2.fastq
output: /home/s1104230/mapping/aln.sam
jobid: 2
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LC_CTYPE = "UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
Could not locate a Bowtie index corresponding to basename "/home/s1104230/mapping/reference"
Error: Encountered internal Bowtie 2 exception (#1)
Command: /usr/bin/bowtie2-align-s --wrapper basic-0 -x /home/s1104230/mapping/reference -1 /home/s1104230/output/tr1.fastq -2 /home/s1104230/output/tr2.fastq
(ERR): bowtie2-align exited with value 1
[Thu Oct 18 23:05:57 2018]
Error in rule bowtie2Aln:
jobid: 2
output: /home/s1104230/mapping/aln.sam
RuleException:
CalledProcessError in line 36 of /home/s1104230/scripts/Snakefile:
Command ' set -euo pipefail; bowtie2 -x /home/s1104230/mapping/reference -1 /home/s1104230/output/tr1.fastq -2 /home/s1104230/output/tr2.fastq > /home/s1104230/mapping/aln.sam ' returned non-zero exit status 1
File "/home/s1104230/scripts/Snakefile", line 36, in __rule_bowtie2Aln
File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run
Removing output files of failed job bowtie2Aln since they might be corrupted:
/home/s1104230/mapping/aln.sam
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/s1104230/scripts/.snakemake/log/2018-10-18T212758.414196.snakemake.log
real 97m59.058s
user 97m35.924s
sys 0m16.792s
As you can see, the rule bowtie2Build has been skipped. This is the rule to build an index from the reference fasta. The output of this is needed for the next rule "bowtie2Align" to execute. The snakefile is as follow:
rule invenU:
# input: ["/home/bnextgen/reads/bngsa_nietinfected_1.fastq","/home/bnextgen/reads/bngsa_nietinfected_2.fastq"]
# output: ["/home/s1104230/output/iutr1.fastq", "/home/s1104230/output/iutr2.fastq"]
# script: "/home/s1104230/scripts/inven.py"
input: ["/home/s1104230/data/bngsa1_24M.txt","/home/s1104230/data/bngsa2_24M.txt"]
output: ["/home/s1104230/output/iutr1.fastq", "/home/s1104230/output/iutr2.fastq"]
script: "/home/s1104230/scripts/inven.py"
rule trimmen:
input: ["/home/s1104230/data/bngsa1_24M.txt","/home/s1104230/data/bngsa2_24M.txt"]
output: ["/home/s1104230/output/tr1.fastq", "/home/s1104230/output/tr2.fastq"]
script: "/home/s1104230/scripts/trimming.py"
# input: ["/home/bnextgen/reads/bngsa_nietinfected_1.fastq","/home/bnextgen/reads/bngsa_nietinfected_2.fastq"]
# output: ["/home/s1104230/output/tr1.fastq", "/home/s1104230/output/tr2.fastq"]
# script: "/home/s1104230/scripts/trimming.py"
rule invenT:
input: rules.trimmen.output
output: ["/home/s1104230/output/itr1.fastq", "/home/s1104230/output/itr2.fastq"]
script: "/home/s1104230/scripts/inven.py"
rule bowtie2Build:
input:
"/home/bnextgen/refgenome/infected_consensus.fasta"
params:
basename="/home/s1104230/mapping/reference"
output:
output1="/home/s1104230/mapping/reference.1.bt2",
output2="/home/s1104230/mapping/reference.2.bt2",
output3="/home/s1104230/mapping/reference.3.bt2",
output4="/home/s1104230/mapping/reference.4.bt2",
outputrev1="/home/s1104230/mapping/reference.rev.1.bt2",
outputrev2="/home/s1104230/mapping/reference.rev.2.bt2"
shell: "bowtie2-build {input} {params.basename}"
rule bowtie2Aln:
input: rules.trimmen.output
params:
basename="/home/s1104230/mapping/reference"
output: "/home/s1104230/mapping/aln.sam"
shell:
"bowtie2 -x {params.basename} -1 {input[0]} -2 {input[1]} > {output}"
rule sam2bam:
input: rules.bowtie2Aln.output
output: "/home/s1104230/mapping/aln.bam"
shell: "samtools view -Sb {input} > {output}"
rule sortbam:
input: rules.sam2bam.output
params:
basename="/home/s1104230/mapping/sorted"
output:
output1="/home/s1104230/mapping/sorted.bam"
shell: "samtools sort {input} {params.basename}"
rule samIndex:
input: rules.sortbam.output
output: "/home/s1104230/mapping/sorted.bam.bai"
shell: "samtools index {input} {output}"
rule copyRef:
input: "/home/bnextgen/refgenome/infected_consensus.fasta"
output: "/home/s1104230/mapping/infected_consensus.fasta"
shell: "cp {input} {output}"
rule samIndex2:
input: rules.copyRef.output
output: "/home/s1104230/mapping/infected_consensus.fasta.fai"
shell: "samtools faidx {input}"
rule bam2Pileup:
input:
rules.copyRef.output,
rules.sortbam.output
output: "/home/s1104230/mapping/aln.mpileup"
shell: "samtools mpileup -f {input[0]} {input[1]} > {output}"
rule pileup2Bcf:
input:
rules.copyRef.output,
rules.sortbam.output
output: "/home/s1104230/mapping/varcalls.bcf"
shell: "samtools mpileup -uf {input[0]} {input[1]} > {output}"
rule bcf2Vcf:
input: rules.pileup2Bcf.output
output: "/home/s1104230/mapping/varcalls.vcf"
shell: "bcftools view -cg {input} > {output}"
rule Vcf2fq:
input: rules.bcf2Vcf.output
output: "/home/s1104230/mapping/consensus.fq"
shell: "/usr/share/samtools/vcfutils.pl vcf2fq {input} > {output}"
rule Vcf2txt:
input: rules.bcf2Vcf.output
output: "/home/s1104230/mapping/varcalls.txt"
shell: "cat {input} > {output}"
rule countvar:
input: rules.Vcf2txt.output
output:
"/home/s1104230/mapping/INDELs.txt",
"/home/s1104230/mapping/SNPs.txt"
script: "/home/s1104230/scripts/countvar.py"
rule all:
input:
rules.invenU.output,
rules.trimmen.output,
rules.invenT.output,
rules.bowtie2Build.output,#6mins
rules.bowtie2Aln.output,
rules.sam2bam.output,
rules.sortbam.output,
rules.samIndex.output,
rules.copyRef.output,
rules.samIndex2.output,
rules.bam2Pileup.output,
rules.pileup2Bcf.output,
rules.bcf2Vcf.output,
rules.Vcf2fq.output,
rules.Vcf2txt.output,
rules.countvar.output
I run the workflow with "time snakemake all" Before doing so, I made sure to rm all outputs from the previous test.
Does anyone know where the problem lies?
Thank you in advance.
Thank you for your reply. That may indeed be the case, how can I prevent this from happening?
In addition I found the following the information from snakemake documentation:
Snakemake allows rules to specify numeric priorities:
Maybe I should try this?
Changing
To
Should solve this error for you. It would be better to state an explicit dependency than specifying a priority in this case.
You can use the
--dryrun/-n
option of snakemake to test out configurations and only launch when it prints the expected commands.Thank you rizoic, I will definitely try this, In the meantime I used the priority method and it seems to be working ( still processing ).