Hello everyone!
A few days ago I started using Snakemake for the first time. I am having an issue when I am trying to run the megahit rule in my pipeline.
It gives me the following error "Outputs of incorrect type (directories when expecting files or vice versa). Output directories must be flagged with directory(). ......"
So initially it runs and then crashes with the above error. I implemented the solution with the directory() option in my pipeline but I think its not a good practice since, for various reasons, you can loose files without even knowing it.
Is there a way to run the rule without using the directory() ?
I would appreciate any help on the issue!
Thanking you in advance
sra = []
with open("run_ids") as f:
for line in f:
sra.append(line.strip())
rule all:
input:
expand("raw_reads/{sample}/{sample}.fastq", sample=sra),
expand("trimmo/{sample}/{sample}.trimmed.fastq", sample=sra),
expand("megahit/{sample}/final.contigs.fa", sample=sra)
rule download:
output:
"raw_reads/{sample}/{sample}.fastq"
params:
"--split-spot --skip-technical"
log:
"logs/fasterq-dump/{sample}.log"
benchmark:
"benchmarks/fastqdump/{sample}.fasterq-dump.benchmark.txt"
threads: 8
shell:
"""
fasterq-dump {params} --outdir /home/raw_reads/{wildcards.sample} {wildcards.sample} -e {threads}
"""
rule trim:
input:
"raw_reads/{sample}/{sample}.fastq"
output:
"trimmo/{sample}/{sample}.trimmed.fastq"
params:
"HEADCROP:15 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36"
log:
"logs/trimmo/{sample}.log"
benchmark:
"benchmarks/trimmo/{sample}.trimmo.benchmark.txt"
threads: 6
shell:
"""
trimmomatic SE -phred33 -threads {threads} {input} trimmo/{wildcards.sample}/{wildcards.sample}.trimmed.fastq {params}
"""
rule megahit:
input:
"trimmo/{sample}/{sample}.trimmed.fastq"
output:
"megahit/{sample}/final.contigs.fa"
params:
"-m 0.7 -t"
log:
"logs/megahit/{sample}.log"
benchmark:
"benchmarks/megahit/{sample}.megahit.benchmark.txt"
threads: 10
shell:
"""
megahit -r {input} -o {output} -t {threads}
"""
Instead of
shell
, you can userun
.I don't know anything about megahit. For your desired output, reading the help file suggests you'd also need
--out-prefix final
.does this help?
megahit [options] {-1 <pe1> -2 <pe2> | --12 <pe12> | -r <se>} [-o <out_dir>]
Output is a directory. To megahit, you are supplying a file as an output.Yes, you are right but I think that in this case does not make a big change..
what do you mean by that? @ cfos4698 said the same thing. Copy/pasting for you: The error makes sense because megahit is trying to make a directory based on your {output}, but your {output} is actually a fasta file.
I understand that megahit outputs a directory and I am trying to output a file... But i dont want to output a directory because I think its a bad practice...I am trying for a way to make it work without using the snakemake directory() function.
Can you try following, retaining every thing else as same:
Btw, megahit can't overwrite. Delete if there are any folders in megahit folder or create a new directory for megahit to store output.
When I run this I am getting:
And when I change the to -o megahit/{wildcards.sample} I am getting:
can you rename
/home/megahit/SRR11192682
to/home/megahit/SRR11192682_bak
and rerun the code?I tried the following:
For both cases I got the following:
I tried also adding the --latency-wait but nothing changed.
Don't change the rule. Rename the existing directory.
As I indicated below,
--out-prefix
is needed.I think this is the issue OP is facing:
solution (either remove or rename the existing):
Instead User seems to have changed the rule, but didn't change other relevant lines. Megahit can't overwrite by default. Instead it suggests to give another directory to write.
Snakemake is expecting
megahit/{sample}_bak/final.contigs.fa
. TheMissingOutputException
comes from not using--out-prefix final
.OP was troubleshooting this problem.
I suggested him to rename to existing output
SRR11192682
. Instead OP changed the rule. But I guess OP hasn't edited rest of the snakefile as per the new change.Hello both and thank you for your help! With your input I managed to make a sloppy solution!
Indeed the --out-prefix helped!
So, now the rule runs. Only the {wildcards.sample}.contigs.fa is being send to /megahit/{sample} directory and the rest of the megahit output is being sent to separate directories for each run ID
I think its not ideal but it works for now.
The error makes sense because
megahit
is trying to make a directory based on your {output}, but your {output} is actually a fasta file. Can you try the following?Hello! Thank you for the input. If i do what you suggest I am getting the following error: