Entering edit mode
11 months ago
Javier
•
0
Hello, I am learning bioinformatics and my professor assigned me the task of creating a pipeline for RNA-seq in Snakemake using the following software: fastqc, cutadapt. fastqc, STAR and rna-seqc.
My problem is that when I try to run my code the following warning appears:
MissingOutputException on line 39 of /home/vdi/Trancryptomica/pipeline/snakefile:
Working Files missing after 5 seconds:
for each output of the first rule, and when I try to fix it I make a new error:
Incomplete Files Exception:
The following files appear to be incomplete. If you are sure that certain files are not incomplete
again, for the outputs of the first rule. Here is my code:
work_dir="/home/vdi/Transcriptomica/pipeline"
GENOME=work_dir + "/data/references/chr10.fa"
TRANSCRIPTOME=work_dir + "/data/references/gencode.v38.annotation.chr10.gtf"
samples=["test-sample1", "test-sample2"]
ids=["_1", "_2"]
rule all:
input:
#fastqc_raw output:
expand(work_dir + "/results/QC/reads/{sample}{id}_fastqc.html", sample=samples, id=ids),
#cutadapt output:
expand(work_dir + "/data/reads/{sample}_1_trimm.fastq", sample=samples),
expand(work_dir + "/data/reads/{sample}_2_trimm.fastq", sample=samples),
#fastqc_trimm output:
expand(work_dir + "/results/QC/reads/{sample}{id}_trimm_fastqc.html", sample=samples, id=ids),
#star index output:
directory(work_dir + "/data/references/index_star/"),
#star alig output:
expand(work_dir + "/results/alignment/{sample}Unmapped.out.mate1", sample=samples),
expand(work_dir + "/results/alignment/{sample}Unmapped.out.mate2", sample=samples),
expand(work_dir + "/results/alignment/{sample}Aligned.sortedByCoord.out.bam", sample=samples),
expand(work_dir + "/results/alignment/{sample}Log.final.out", sample=samples),
expand(work_dir + "/results/cuantification/{sample}read_counts.out", sample=samples),
expand(work_dir + "/results/cuantification/{sample}read_counts.txt", sample=samples),
#rna-seqc output:
directory(work_dir + "/results/QC/alignment/")
ruleorder: fastqc_raw > fastqc_trimm
rule fastqc_raw:
input:
work_dir + "/data/reads/{sample}{id}.fastq"
output:
zip=work_dir + "/results/QC/reads/{sample}{id}_fastqc.zip",
html=work_dir + "/results/QC/reads/{sample}{id}_fastqc.html"
params:
path="results/QC/reads/"
shell:
"""
fastqc {input} -o {params}
"""
rule cutadapt:
input:
work_dir + "/data/reads/{sample}_1.fastq",
work_dir + "/data/reads/{sample}_2.fastq"
output:
work_dir + "/data/reads/{sample}_1_trimm.fastq",
work_dir + "/data/reads/{sample}_2_trimm.fastq"
shell:
"""
cutadapt -a CGCCTTGGCCGT -A CGCCTTGGCCGT -o {output[0]} -p {output[1]} {input[0]} {input[1]}
"""
rule fastqc_trimm:
input:
work_dir + "/data/reads/{sample}{id}_trimm.fastq"
output:
zip=work_dir + "/results/QC/reads/{sample}{id}_trimm_fastqc.zip",
html=work_dir + "/results/QC/reads/{sample}{id}_trimm_fastqc.html"
params:
path="results/QC/reads/"
shell:
"""
fastqc {input} -o {params}
"""
rule STAR_index:
input:
GENOME,
TRANSCRIPTOME
output:
directory(work_dir + "/data/references/index_star")
shell:
"""
mkdir -p data/references/index_star
STAR --runThreadN 3 --runMode genomeGenerate --genomeDir {output} --genomeFastaFiles {input[0]} --sjdbGTFfile {input[1]} -- sjdbOverhang 100
"""
rule STAR_alig:
input:
ref=work_dir + "/data/references/index_star/",
fw=work_dir + "/data/reads/{sample}_1_trimm.fastq",
rv=work_dir + "/data/reads/{sample}_2_trimm.fastq"
output:
work_dir + "/results/alignment/{sample}Unmapped.out.mate1",
work_dir + "/results/alignment/{sample}Unmapped.out.mate2",
work_dir + "/results/alignment/{sample}Aligned.sortedByCoord.out.bam",
work_dir + "/results/alignment/{sample}Log.final.out"
params:
path=work_dir + "/results/alignment/"
shell:
"""
mkdir {params}
STAR --genomeDir {input.ref} --quantMode TranscriptomeSAM --readFilesIn {input.fw} {input.rv} --outFileNamePrefix {params} --readFilesCommand zcat --outFilterIntronMotifs RemoveNoncanonical --outSAMtype BAM SortedByCoordinate --outReadsUnmapped Fastx
"""
rule FeatureCounts:
input:
TRANSCRIPTOME,
BAMS=expand(work_dir + "/results/alignment/{sample}Aligned.sortedByCoord.out.bam", sample=samples)
output:
TPG=work_dir + "/results/cuantification/{sample}read_counts.out",
PG=work_dir + "/results/cuantification/{sample}read_counts.txt"
shell:
"""
featureCounts -Q 20 -p -O -T 5 -a {input[0]} -o {output.TPG} {input[1]}
cut -f1,7- {output.TPG} | sed 1d > {output.PG}
"""
rule RNASEQC:
input:
TRANSCRIPTOME,
BAMS=expand(work_dir + "/results/alignment/{sample}Aligned.sortedByCoord.out.bam", sample=samples)
output:
directory(work_dir + "/results/QC/alignment/")
shell:
"""
rnaseqc {input[0]} {input[1]} {output}
"""
I would like to know how to exit this bucle were i go from a error to other error
My guess would be that your
work_dir
is not your actual working directory. Did you launched your workflow from/home/vdi/Transcriptomica/pipeline
?yes theres were my snakefile is
So, it wasn't exactly the problem, but your repetition kept me thinking and I changed the directions in the snakefile "adding / to the end of work_dir and removing one at the beginning of the others" and it works. thank you
How are you executing the pipeline? Please show us all your commands.
snakemake --cores all
Thats the only comand, initially I used :
snakemake --cores all -np
to make dry runs and fix the errors, once everything was solved I moved on to the real run