I'm trying to convert the SJ.tab.out
file from STAR into a bed file within a snakemake script.
the rule for that is this:
rule map_star_SE:
input:
R1='data/samples/single-end/{sample}_R1.fastq',
index=directory("data/starIndex/")
output:
bam='Analysis/star/bamFiles/{sample}.bam',
params:
prefix = 'Analysis/star/bamFiles/{sample}',
starlogs = 'Analysis/starlogs',
gz_support=gz_command
threads: 16
shell:
r'''
mkdir -p {params.prefix} && \
STAR --runThreadN {threads} \
--genomeDir {input.index} \
--outFileNamePrefix {params.prefix} --readFilesIn {input.R1} {params.gz_support} \
--outSAMtype BAM SortedByCoordinate \
--limitBAMsortRAM {config[RAM]} \
--quantMode GeneCounts \
--outReadsUnmapped Fastx &&\
mv {params.prefix}Aligned.sortedByCoord.out.bam {output.bam} &&\
mkdir -p {params.starlogs} &&\
mv {params.prefix}Log.final.out {params.prefix}Log.out {params.prefix}Log.progress.out {params.starlogs}
'''
rule index:
input:
'Analysis/star/bamFiles/{sample}.bam'
output:
'Analysis/star/bamFiles/{sample}.bam.bai'
message:
"Indexing {wildcards.sample}.sorted.bam"
benchmark:
"Analysis/benchmarks/{sample}.index_bam.txt"
shell:
'samtools index {input}'
rule align_SJ2Bed:
input:
"Analysis/star/bamFiles/{sample}SJ.out.tab"
output:
"Analysis/star/bamFiles/{sample}.junctions.bed"
shell:
"scripts/STAR_SJtab2JunctionsBed.py -f {input} > {output}"
The global rule_all
is
rule all:
input:
directory("data/starIndex/"),
expand("Analysis/star/bamFiles/{sample}.bam", sample=SAMPLES),
expand("Analysis/star/bamFiles/{sample}.bam.bai", sample=SAMPLES),
expand("Analysis/star/bamFiles/{sample}.junctions.bed", sample=SAMPLES)
when I run this analysis from scratch I get the error message with the last part of it.
$ snakemake -ps workflow.Snakefile -j 30 --force
Building DAG of jobs...
MissingInputException in line 46 of /local/Assa/projects/automation/01.Mapping/Star.Mapping_SE.Snakefile:
Missing input files for rule align_SJ2Bed:
Analysis/star/bamFiles/1_S1SJ.out.tab
But When I run the rule_all
with the last output command commented out, it runs all the way with no errors. When I than activate the last row of the rule_all
part and re-run the script it works for this part as well.
Why can't snakemake see that the input file I would like to create in the rule_align_SJ2Bed
is created in the a previous step, namely in rule_map_star_PE
.
thanks
Assa
This make sense. I'll try it. thanks
yes, this worked fine. Thanks for that. I still don't understand this kind of error though.
I have added another rule now for the quantification using featureCounts.
When I'm running the rule by its own with
snakemake -nps Quantification.Snakefile -j 30
it runs perfectly well.But when I try to add it to my
rule all
it throws back the same error as above
But I do have the input file for this using the
bam
parameters. What is the difference, if it is asked for on therule all
or within therule quant
itself?I am not sure why you need
rule all
for in this case as you are saving the output from eachbam
file to a single file.I'm not sure how else I can do that.
How can I call for a rule with wildcards in the output?
e.g. How can I run the
rule index
by itself?