To make it easy I have to modify the post. The situation is that at the beginning I ran the pipeline well on a local machine but failed when submitted to cluster. After posting the question, I found the version of snakemake was 3.13.3, so I updated to v5.7.3, and then I found it failed on both of the local machine and cluster. Thus, I am now struggling to figure what’ wrong with my Snakefile or anything else.
The error message:
Waiting at most 5 seconds for missing files.
MissingOutputException in line 24 of /work/path/rna_seq_pipeline/Snakefile:
Missing files after 5 seconds:
bam/A2_Aligned.toTranscriptome.out.bam
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /work/path/rna_seq_pipeline/.snakemake/log/2019-11-07T153434.327966.snakemake.log
Here is my Snakefile
:
# config file
configfile: "config.yaml"
shell.prefix("source ~/.bash_profile")
# determine which genome reference you would like to use
# here we are using GRCm38
# depending on the freeze, the appropriate references and data files will be chosen from the config
freeze = config['freeze']
# read list of samples, one per line
with open(config['samples']) as f:
SAMPLES = f.read().splitlines()
rule all:
input:
starindex = config['reference']['stargenomedir'][freeze] + "/" + "SAindex",
rsemindex = config['reference']['rsemgenomedir'][freeze] + ".n2g.idx.fa",
fastqs = expand("fastq/{file}_{rep}_paired.fq.gz", file = SAMPLES, rep = ['1','2']),
bams = expand("bam/{file}_Aligned.toTranscriptome.out.bam", file = SAMPLES),
quant = expand("quant/{file}.genes.results", file = SAMPLES)
# align using STAR
rule star_align:
input:
f1 = "fastq/" + "{file}_1_paired.fq.gz",
f2 = "fastq/" + "{file}_2_paired.fq.gz"
output:
out = "bam/" + "{file}_Aligned.toTranscriptome.out.bam"
params:
star = config['tools']['star'],
genomedir = config['reference']['stargenomedir'][freeze],
prefix = "bam/" + "{file}_"
threads: 12
shell:
"""
{params.star} \
--runThreadN {threads} \
--genomeDir {params.genomedir} \
--readFilesIn {input.f1} {input.f2} \
--readFilesCommand zcat \
--outFileNamePrefix {params.prefix} \
--outSAMtype BAM SortedByCoordinate \
--outSAMunmapped Within \
--quantMode TranscriptomeSAM \
--outSAMattributes NH HI AS NM MD \
--outFilterType BySJout \
--outFilterMultimapNmax 20 \
--outFilterMismatchNmax 999 \
--outFilterMismatchNoverReadLmax 0.04 \
--alignIntronMin 20 \
--alignIntronMax 1000000 \
--alignMatesGapMax 1000000 \
--alignSJoverhangMin 8 \
--alignSJDBoverhangMin 1 \
--sjdbScore 1 \
--limitBAMsortRAM 50000000000
"""
# quantify expression using RSEM
rule rsem_quant:
input:
bam = "bam/" + "{file}_Aligned.toTranscriptome.out.bam"
output:
quant = "quant/" + "{file}.genes.results"
params:
calcexp = config['tools']['rsem']['calcexp'],
genomedir = config['reference']['rsemgenomedir'][freeze],
prefix = "quant/" + "{file}"
threads: 12
shell:
"""
{params.calcexp} \
--paired-end \
--no-bam-output \
--quiet \
--no-qualities \
-p {threads} \
--forward-prob 0.5 \
--seed-length 21 \
--fragment-length-mean -1.0 \
--bam {input.bam} {params.genomedir} {params.prefix}
"""
And my config.yaml
:
freeze: grcm38
# samples file
samples:
samples.txt
# software, binaries or tools
tools:
fastqdump: fastq-dump
star: STAR
rsem:
calcexp: rsem-calculate-expression
prepref: rsem-prepare-reference
# reference files, genome indices and data
reference:
stargenomedir:
grch38: /work/path/reference/STAR/GRCh38
grcm38: /work/path/reference/STAR/GRCm38
rsemgenomedir:
grch38: /work/path/reference/RSEM/GRCh38/GRCh38
grcm38: /work/path/reference/RSEM/GRCm38/GRCm38
fasta:
grch38: /work/path/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa
grcm38: /work/path/reference/GRCm38/Mus_musculus.GRCm38.dna.primary_assembly.fa
gtf:
grch38: /work/path/reference/GRCh38/Homo_sapiens.GRCh38.98.gtf
grcm38: /work/path/reference/GRCm38/Mus_musculus.GRCm38.98.gtf
And finally, samples.txt
:
A1
A2
ps: adapted from the pipeline https://github.com/komalsrathi/rnaseq-star-rsem-pipeline/blob/master/Snakefile
The way to debug anything is to start with something that works and then make small changes until it doesn’t. I am not totally convinced your worker nodes have access to /work. Maybe drop the absolute directories and try some helloworld Snakefiles with qsub and work your way up to the problem.
What version of snakemake are you running?
3.13.3... I didn't realized it is such an old version. Follow the instructions of snakemake tutorial and do not know why it is not the latest. I will try the latest version to see whether the problem is solved.
Yeah that's like.... 4 year old version!
I think if you update the issue will be fixed. Checkout this issue from 2015 with similar symptoms: https://bitbucket.org/snakemake/snakemake/issues/273/cannot-touch-temporary-snakemake-file
Can you link to the snakemake tutorial you are following?
The official one here: https://snakemake.readthedocs.io/en/stable/getting_started/installation.html (Seems forgot to use
-c conda-forge
at that time, anyway it works now)However, after upgrading to v5.7.4, still surfer the problem. As I have two pairs of raw paired-end fastq file, the error file 1 returned by cluster:
and another one:
Sorry out of my depth. I hope you get an answer. If you do find one, please come back andpost it.
Did you read this part of the error message and did you try the suggestion?
The problem could also be in the way you've written your pipeline so showing your snakemake file could be useful.
I saw that and tried to add parameters `--latency-wait 180', but just spent more time to see the error message. The weird thing is that after upgrading snakemake, running it on a local machine also failed
I will add my snakemake file in the original post
Most of the time that error means that a rule cannot find an certain output file. If it is not no to much work for you it always helps the execute all the steps manually from the command line and double check the filenames of the output files.
In your case it looks like it can not find the file
bam/A2_Aligned.toTranscriptome.out.bam
is it created? Is it the right path?Yeah snakemake created the
bam/
directory but not the filebam/A2_Aligned.toTranscriptome.out.bam
. I usedsnakemake -np
to create shell command and ran it manually. The command ran successfully except I needed to create directorybam/
andquant/
manually, but I think snakemake can automatically mkdir, isn't it?not if it is an input. Your
bam/
has to be there because it is in the input part of one rule, since it is the output of another, I would start there.Are you sure
SortedByCoordinate
andTranscriptomeSAM
generate*_Aligned.toTranscriptome.out.bam
?Never mind. It apparently generates both
Aligned.sortedByCoord.out.bam
andAligned.toTranscriptome.out.bam
.What versions of STAR/RSEM are you using?
Also, can you do a
tree
(or show the directory structure) on your working directory? I see you have hardcoded the paths ofbam/, fastq/ and quant/
. Just want to make sure these directories exist.