Entering edit mode
5.1 years ago
Ming
▴
110
Dear All,
I am trying to run BBMap with snakemake, and I am pretty new to this.
# rule all: Specifies the files that you would like to create during your snakemake workflow
import os
import snakemake.io
import glob
(SAMPLES,READS,) = glob_wildcards("/home/tanshiming/Downloads/{sample}_{read}_001.fastq.gz")
READS=["R1","R2"]
rule all:
input: expand("/home/tanshiming/Downloads/{sample}_{read}_001.fastq.gz",sample=SAMPLES, read=READS)
rule clumpify:
input:
r1="/home/tanshiming/Downloads/{sample}_R1_001.fastq.gz",
r2="/home/tanshiming/Downloads/{sample}_R2_001.fastq.gz"
output:
o1="/home/tanshiming/Downloads/Clumpify/{sample}_R1.fastq.gz",
o2="/home/tanshiming/Downloads/Clumpify/{sample}_R2.fastq.gz"
shell:
"clumpify.sh -Xmx50g in1={input.r1} in2=${input.r2} out1=Clumpify/{output.o1} out2=Clumpify/${output.o2} reorder ziplevel=9 dedupe=t optical=t"
When I try to run snakemake, I got the following error:
(bbmap) tanshiming@S620100019205:~/Scripts$ snakemake -n
SyntaxError in line 15 of /home/tanshiming/Scripts/Snakefile:
invalid syntax
This is the code that I will like to run:
Remove duplicates
for x in *_R1_001.fastq.gz
do clumpify.sh -Xmx250g in1=$x in2=${x%_R1_001*}_R2_001.fastq.gz out1=Clumpify/$x out2=Clumpify/${x%_R1_001*}_R2_001.fastq.gz reorder ziplevel=9 dedupe=t optical=t
done
Appreciate any advice that I can get!
Thank you.
Thanks @gb, but I am getting the following error now:
This is at the clumpify.sh line.
You are missing "" around the command....
https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html
Dear @gb,
When I try to run snakemake, this script does not seem to run:
But I do see that the job has not run.......
The input specification of
rule all
is exactly the files that you already have at the beginning. Therefore snakemake doesn't do anything: you already have what you need.Dear WouterDeCoster,
Does that mean I have to delete the rule all to run the script?
I believe the rule all need to be the output of rule clumpify. Snakemake checks what files it needs to output (rule all). Next, it checks how it can get those files. So if you put the output files from rule clumpify in rule all there is a "connection".
Snakemake checks the output files in rule all, they are not there yet. It check how it can get them, it sees that if he execute rule clumpify he gets the output he needs. So he will first execute that rule before he can finish.
Have you tried following the tutorial?
The input of
rule all
should be the file you aim to obtain out of this workflow. It should be the final output file.This worked for me!
Thank you very much for your help!
If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.