Hello all,
Im rather new to Snakemake using Python, im trying to make a pipeline but the Rule all from the main script seems to have the wrong order and i cant seem to change it no matter what i do. Can anybody show me what im doing wrong.
Here is my first script:
configfile: "/home/PycharmProjects/Pipeline/config.yaml"
rule first:
input:
expand("Trimmed_reads/{srr}_trimmed.fastq",srr=config['srr'])
rule prefetch:
output:
"prefetch_files/sra/{srr}.sra"
params:
"{srr} --max-size 250GB -O sra_files"
log:
"prefetch_files/sra/{srr}.log"
message:
"Downloading files"
shell:
"""
/Tools/sra_toolkit/sratoolkit.3.0.0-ubuntu64/bin/prefetch {params} > {log} 2>&1 && touch
{output}
"""
And within the same file is:
rule fastqdump:
input:
"prefetch_files/sra/{srr}.sra"
output:
touch("prefetch_files/done__{srr}_dump")
params:
args = "-S -O fastq_files/ -t fastq_files/ ",
id_srr = "{srr}"
log:
"prefetch_files/{srr}.log"
shell:
"""
/Tools/sra_toolkit/sratoolkit.3.0.0-ubuntu64/bin/fasterq-dump {params.args} {params.id_srr} > {log} 2>&1
"""
If i run this first manually nothing is wrong and it gets me all my files that i need for the following script (i think. i can be wrong here, but it gets me files)
Then in a second script i try the trimmomatic:
configfile: "/PycharmProjects/Pipeline/config.yaml"
rule now:
input:
expand("Trimmed_reads/{srr}_trimmed.fastq", srr=config['srr'])
rule trimmomatic:
input:
unused = "prefetch_files/done__{srr}_dump",
raw=config['FileDir']+"/{srr}.fastq",
anno=config["trimmomatic"]["adapter"]
output:
touch("Trimmed_reads/{srr}_trimmed.fastq")
threads: config["trimmomatic"]["treads"]
params:
jar=config["trimmomatic"]["jar"],
phred=config["trimmomatic"]["phred"],
minlen=config["trimmomatic"]["minlen"],
trailing=config["trimmomatic"]["trailing"],
leading=config["trimmomatic"]["leading"],
slidwindow=config["trimmomatic"]["slidwindow"]
message: "Started read trimming!"
log:
"logs/trimmomatic/{srr}_trimmed.log"
shell:
"(java -jar {params.jar} SE {params.phred} {input.raw} {output} ILLUMINACLIP:
{input.anno}:2:30:10{params.leading}{params.trailing}{params.slidwindow} {params.minlen}) >
{log} 2>&1"`
And my main.smk is this: configfile: "/PycharmProjects/Pipeline/config.yaml"
include: "download_sample.smk"
include: "trimming.smk"
include: "dagfile.smk"
rule all:
input:
expand("Trimmed_reads/{srr}_trimmed.fastq",srr=config['srr']),
expand("prefetch_files/done__{srr}_dump", srr=config['srr'])
And in case its important my config.yaml:
FileDir: "/PycharmProjects/Pipeline/Pipeline/workflow/Pre-processing/fastq_files"
srr:
- SRR5327856
- SRR5327984
- SRR5327985
trimmomatic:
adapter: /PycharmProjects/Pipeline/all_adapters.fa
jar: /Documents/Lisan/Tools/Trimmomatic-0.39/trimmomatic-0.39.jar
phred: -phred33
minlen: 45
trailing: 3
leading: 3
slidwindow: 4:15
treads: 35
I tried adding the output files from the prefetch to the trimmomatic input but this doesnt seem to help. Anytime i run the main it will start with the trimmomatic file and error since the files dont exist.
(base) Workstation:~/PycharmProjects/Pipeline/Pipeline/workflow/Pre-processing$
snakemake --snakefile main.smk -c4
Building DAG of jobs...
MissingInputException in rule trimmomatic in line 9 of
/PycharmProjects/Pipeline/Pipeline/workflow/Pre-processing/trimming.smk:
Missing input files for rule trimmomatic:
output: Trimmed_reads/SRR5327856_trimmed.fastq
wildcards: srr=SRR5327856
affected files:
/PycharmProjects/Pipeline/Pipeline/workflow/Pre-processing/fastq_files/SRR5327856.fastq
I tried googling my error or my fault once i got stuck but i didnt really find anything which lead me to the possible conclusion that its probably something very simple that im not seeing. I dont have anybody around me who can help me with Python or Snakemake so i hope somebody here can help me. Thanks
The error indicates
Missing input files for rule trimmomatic
. So we start the debug from there. In yourtrimmomatic
rule, it asks for input files that satisfy"config['FileDir']+"/{srr}.fastq"
, but none of your rules defines these files as outputs. Even though we know these fastqs are generated byfasterq-dump
, snakemake doesn't. So you need to specifically define those files in yourfastqdump
rule as outputs. Otherwise, snakemake doesn't know how to build the dag. Hope this helps.