problem with missing input file in snakemake rule
1
1
Entering edit mode
5.1 years ago
Assa Yeroslaviz ★ 1.9k

I'm trying to convert the SJ.tab.out file from STAR into a bed file within a snakemake script.

the rule for that is this:

 rule map_star_SE:
    input:
        R1='data/samples/single-end/{sample}_R1.fastq',
        index=directory("data/starIndex/")
    output:
        bam='Analysis/star/bamFiles/{sample}.bam',
    params:
        prefix = 'Analysis/star/bamFiles/{sample}',
        starlogs = 'Analysis/starlogs',
        gz_support=gz_command
    threads: 16
    shell:
        r'''
        mkdir -p {params.prefix} && \
        STAR --runThreadN {threads} \
             --genomeDir {input.index} \
             --outFileNamePrefix {params.prefix} --readFilesIn {input.R1} {params.gz_support} \
             --outSAMtype BAM SortedByCoordinate \
             --limitBAMsortRAM {config[RAM]} \
             --quantMode GeneCounts \
             --outReadsUnmapped Fastx &&\
        mv {params.prefix}Aligned.sortedByCoord.out.bam {output.bam} &&\
        mkdir -p {params.starlogs} &&\
        mv {params.prefix}Log.final.out {params.prefix}Log.out {params.prefix}Log.progress.out {params.starlogs}
        '''

rule index:
    input:
        'Analysis/star/bamFiles/{sample}.bam'
    output:
        'Analysis/star/bamFiles/{sample}.bam.bai'
    message:
        "Indexing {wildcards.sample}.sorted.bam"
    benchmark:
        "Analysis/benchmarks/{sample}.index_bam.txt"
    shell:
        'samtools index {input}'

   rule align_SJ2Bed:
        input:
            "Analysis/star/bamFiles/{sample}SJ.out.tab"
        output:
            "Analysis/star/bamFiles/{sample}.junctions.bed"
         shell:
            "scripts/STAR_SJtab2JunctionsBed.py -f {input} > {output}"

The global rule_all is

rule all:
   input:
      directory("data/starIndex/"),
      expand("Analysis/star/bamFiles/{sample}.bam", sample=SAMPLES),
      expand("Analysis/star/bamFiles/{sample}.bam.bai", sample=SAMPLES),
      expand("Analysis/star/bamFiles/{sample}.junctions.bed", sample=SAMPLES)

when I run this analysis from scratch I get the error message with the last part of it.

$ snakemake -ps workflow.Snakefile -j 30 --force

Building DAG of jobs...

MissingInputException in line 46 of /local/Assa/projects/automation/01.Mapping/Star.Mapping_SE.Snakefile:

Missing input files for rule align_SJ2Bed:

Analysis/star/bamFiles/1_S1SJ.out.tab

But When I run the rule_all with the last output command commented out, it runs all the way with no errors. When I than activate the last row of the rule_all part and re-run the script it works for this part as well. Why can't snakemake see that the input file I would like to create in the rule_align_SJ2Bed is created in the a previous step, namely in rule_map_star_PE.

thanks

Assa

snakemake rule mapping star • 9.1k views
ADD COMMENT
1
Entering edit mode
5.1 years ago
Sej Modha 5.3k

My guess is that the only output that can be found from rule map_star_SE is a bam file as you have specified. If you are saving a *.out.tab as part of that rule then you'd have to explicitly specify that in the rule map_star_SE output section.

ADD COMMENT
0
Entering edit mode

This make sense. I'll try it. thanks

ADD REPLY
0
Entering edit mode

yes, this worked fine. Thanks for that. I still don't understand this kind of error though.

I have added another rule now for the quantification using featureCounts.

rule quant:
    input:
        gtf = "data/genome.gtf",
        bam = expand("Analysis/star/bamFiles/{sample}.bam", sample=SAMPLES)
    output:
        Gene = "Analysis/quantification/GeneCounts.txt",
    message:
        "running featureCounts"
    threads: 15
    log:
        log  = "Analysis/quantification/featureCounts.stat"
    shell:
        " featureCounts -T {threads} -a {input.gtf} -t exon -g gene_id -M\
        -o {output.Gene} {input.bam} >>  {log.log} 2>&1"

When I'm running the rule by its own with snakemake -nps Quantification.Snakefile -j 30 it runs perfectly well.

But when I try to add it to my rule all

rule all:
   input:
      directory("data/starIndex/"),
      expand("Analysis/star/bamFiles/{sample}.bam", sample=SAMPLES),
      expand("Analysis/star/bamFiles/{sample}.counts.tab", sample=SAMPLES),
      expand("Analysis/star/bamFiles/{sample}.bam.bai", sample=SAMPLES),
      expand("Analysis/star/bamFiles/{sample}.junctions.bed", sample=SAMPLES),
      "Analysis/quantification/GeneCounts.txt"

it throws back the same error as above

$ snakemake -nps workflow.Snakefile -j 30 --forceall
Building DAG of jobs...
MissingInputException in line 12 of /local/Assa/projects/automation/01.Mapping/workflow.Snakefile:
Missing input files for rule all:
Analysis/quantification/GeneCounts.txt

But I do have the input file for this using the bam parameters. What is the difference, if it is asked for on the rule all or within the rule quant itself?

ADD REPLY
0
Entering edit mode

I am not sure why you need rule all for in this case as you are saving the output from each bam file to a single file.

ADD REPLY
1
Entering edit mode

I'm not sure how else I can do that.

How can I call for a rule with wildcards in the output?

e.g. How can I run the rule index by itself?

ADD REPLY

Login before adding your answer.

Traffic: 1728 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6