snakemake different multiple input paths not in output or rule all
1
0
Entering edit mode
4.3 years ago
StartR ▴ 30

Hi

I am new to snakemake - here is my questions:

I have different bam files from multiple folders : folder structure is:

Folder1/sample1/libx/folder1/alginment.bam
Folder1/sample2/liby/folder1/alginment.bam
Folder1/sample3/libsome/foldersome/alginment.bam

lots of bam files in different folders

I have made a path.txt to alignment.bam files, where each line contains:

sample1/libx/folder1/
sample2/liby/folder1/

and so on..

now my problem is the {input} files will be very different from {output} files

I want my output files to be like this:

Sample1/Sample1.sorted.bam
Sample2/Samle2.sorted.bam

How can I achieve this? So far I have done like this:

dir: /path/to/workdir

with open("path_to_bams.txt")
    PATH = infile.read()

rule all:
    input:
        expand("{dir}/sorted_bams/{sample}/{sample}.sortedByCoord.bam", dir= DIR,sample=SAMPLES)

rule sort_bam:
    input:
        bam = expand("{dir}/{path}/alignment.bam", dir= DIR, path=PATH)     
    output:
        temp("{dir}/sorted_bams/{sample}/{sample}.sortedByCoord.bam")
    log:
       "{dir}/sorted_bams/{sample}/{sample}.sortedByCoord.tmp"
    params: mem="5G"
    conda:"env.yaml"
    shell:
        "samtools sort -T {log} -m {params.mem} {input.bam} {output} "

when I print PATH: I get exact path to the folders that I want ..

I get this error, because I am not able to give path as wildcard:

Building DAG of jobs...
InputFunctionException in line 22 of /path/to/Snakefile:
AttributeError: 'Wildcards' object has no attribute 'path'
Wildcards:
dir=/path/to/bamfiles
sample=Sample1

How can I give {path} in the wildcard if it is not in rule all?

OR the different approach can be I replace the BAM file names as:

sample1.libx.folder1.alginment.bam
sample2.liby.folder1.alginment.bam
sample3.libsome.foldersome.alginment.bam

and then use {sample} as input: How can I do it?

But I think this will create two sets of bam files which will use more memory?

Thanks

Please help

BAM samtools snakemake • 3.5k views
ADD COMMENT
2
Entering edit mode
4.3 years ago

If I understand correctly, I would make path_to_bams.txt a table that links input files to sample names, like this:

sample   infile
s1       path/to/a.bam
s2       path/to/b.bam
s3       path/to/c.bam

Then read this file in a pandas dataframe and use a lambda function to assign the {sample} wildcard to its input file. Like:

import pandas

ss = pandas.read_csv('path_to_bams.txt', sep= '\t')

rule all:
    input:
        expand("sorted_bams/{sample}/{sample}.sortedByCoord.bam", sample= ss.sample)

rule sort_bam:
    input:
        lambda wc: ss[ss.sample == wc.sample].infile,
    output:
        "sorted_bams/{sample}/{sample}.sortedByCoord.bam",
    shell:
        r"""
        samtools sort {input} > {output}
        """

Alternatively, instead of a table you could parse the input filenames to create a dictionary with key=sample and value=/path/to/file but I would prefer to avoid parsing paths and file names as you have to make assumptions on how they are called - better to be explicit and use a table.

ADD COMMENT
0
Entering edit mode

yes, great! thanks a lot! Just a comment that the script is not working with ss.sample, instead I used ss['sample'] Thanks dariober!

ADD REPLY
0
Entering edit mode

I don't understand why one would use pandas for a simple operation such as reading a small table/map. This only adds another dependency to take care of, with no extra value whatsoever (from my point of view). Why not just use a csv.reader?

ADD REPLY
0
Entering edit mode

Hi, are you suggesting the way that I have done before, like: open("path_to_bams.txt") PATH = infile.read()

ADD REPLY
2
Entering edit mode

No, I am criticising using pandas for this. Other than that I think dariober's method of using a tabular file mapping sample to bamfile is the right thing to do. Just with something like:

ss = dict((r[0], r[1]) for i, r in enumerate(csv.reader(open("paths_to_bams.txt"), delimiter="\t")) if i > 0)

rule sort_bam:
    input:
        lambda wc: ss[wc.sample]
ADD REPLY

Login before adding your answer.

Traffic: 2353 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6