Question

snakemake different multiple input paths not in output or rule all

0

Entering edit mode

5.0 years ago

StartR ▴ 30

Hi

I am new to snakemake - here is my questions:

I have different bam files from multiple folders : folder structure is:

Folder1/sample1/libx/folder1/alginment.bam
Folder1/sample2/liby/folder1/alginment.bam
Folder1/sample3/libsome/foldersome/alginment.bam

lots of bam files in different folders

I have made a path.txt to alignment.bam files, where each line contains:

sample1/libx/folder1/
sample2/liby/folder1/

and so on..

now my problem is the {input} files will be very different from {output} files

I want my output files to be like this:

Sample1/Sample1.sorted.bam
Sample2/Samle2.sorted.bam

How can I achieve this? So far I have done like this:

dir: /path/to/workdir

with open("path_to_bams.txt")
    PATH = infile.read()

rule all:
    input:
        expand("{dir}/sorted_bams/{sample}/{sample}.sortedByCoord.bam", dir= DIR,sample=SAMPLES)

rule sort_bam:
    input:
        bam = expand("{dir}/{path}/alignment.bam", dir= DIR, path=PATH)     
    output:
        temp("{dir}/sorted_bams/{sample}/{sample}.sortedByCoord.bam")
    log:
       "{dir}/sorted_bams/{sample}/{sample}.sortedByCoord.tmp"
    params: mem="5G"
    conda:"env.yaml"
    shell:
        "samtools sort -T {log} -m {params.mem} {input.bam} {output} "

when I print PATH: I get exact path to the folders that I want ..

I get this error, because I am not able to give path as wildcard:

Building DAG of jobs...
InputFunctionException in line 22 of /path/to/Snakefile:
AttributeError: 'Wildcards' object has no attribute 'path'
Wildcards:
dir=/path/to/bamfiles
sample=Sample1

How can I give {path} in the wildcard if it is not in rule all?

OR the different approach can be I replace the BAM file names as:

sample1.libx.folder1.alginment.bam
sample2.liby.folder1.alginment.bam
sample3.libsome.foldersome.alginment.bam

and then use {sample} as input: How can I do it?

But I think this will create two sets of bam files which will use more memory?

Thanks

Please help

BAM samtools snakemake • 4.1k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 5.0 years ago by StartR ▴ 30

score 2 · Answer 1 · 2020-07-27

2

Entering edit mode

5.0 years ago

dariober 15k

If I understand correctly, I would make path_to_bams.txt a table that links input files to sample names, like this:

sample   infile
s1       path/to/a.bam
s2       path/to/b.bam
s3       path/to/c.bam

Then read this file in a pandas dataframe and use a lambda function to assign the {sample} wildcard to its input file. Like:

import pandas

ss = pandas.read_csv('path_to_bams.txt', sep= '\t')

rule all:
    input:
        expand("sorted_bams/{sample}/{sample}.sortedByCoord.bam", sample= ss.sample)

rule sort_bam:
    input:
        lambda wc: ss[ss.sample == wc.sample].infile,
    output:
        "sorted_bams/{sample}/{sample}.sortedByCoord.bam",
    shell:
        r"""
        samtools sort {input} > {output}
        """

Alternatively, instead of a table you could parse the input filenames to create a dictionary with key=sample and value=/path/to/file but I would prefer to avoid parsing paths and file names as you have to make assumptions on how they are called - better to be explicit and use a table.

ADD COMMENT • link 5.0 years ago by dariober 15k

0

Entering edit mode

yes, great! thanks a lot! Just a comment that the script is not working with ss.sample, instead I used ss['sample'] Thanks dariober!

ADD REPLY • link 5.0 years ago by StartR ▴ 30

0

Entering edit mode

I don't understand why one would use pandas for a simple operation such as reading a small table/map. This only adds another dependency to take care of, with no extra value whatsoever (from my point of view). Why not just use a csv.reader?

ADD REPLY • link 5.0 years ago by cschu181 ★ 2.8k

0

Entering edit mode

Hi, are you suggesting the way that I have done before, like: open("path_to_bams.txt") PATH = infile.read()

ADD REPLY • link 5.0 years ago by StartR ▴ 30

2

Entering edit mode

No, I am criticising using pandas for this. Other than that I think dariober's method of using a tabular file mapping sample to bamfile is the right thing to do. Just with something like:

ss = dict((r[0], r[1]) for i, r in enumerate(csv.reader(open("paths_to_bams.txt"), delimiter="\t")) if i > 0)

rule sort_bam:
    input:
        lambda wc: ss[wc.sample]

ADD REPLY • link 5.0 years ago by cschu181 ★ 2.8k