Hi
I am new to snakemake - here is my questions:
I have different bam files from multiple folders : folder structure is:
Folder1/sample1/libx/folder1/alginment.bam
Folder1/sample2/liby/folder1/alginment.bam
Folder1/sample3/libsome/foldersome/alginment.bam
lots of bam files in different folders
I have made a path.txt to alignment.bam files, where each line contains:
sample1/libx/folder1/
sample2/liby/folder1/
and so on..
now my problem is the {input}
files will be very different from {output}
files
I want my output files to be like this:
Sample1/Sample1.sorted.bam
Sample2/Samle2.sorted.bam
How can I achieve this? So far I have done like this:
dir: /path/to/workdir
with open("path_to_bams.txt")
PATH = infile.read()
rule all:
input:
expand("{dir}/sorted_bams/{sample}/{sample}.sortedByCoord.bam", dir= DIR,sample=SAMPLES)
rule sort_bam:
input:
bam = expand("{dir}/{path}/alignment.bam", dir= DIR, path=PATH)
output:
temp("{dir}/sorted_bams/{sample}/{sample}.sortedByCoord.bam")
log:
"{dir}/sorted_bams/{sample}/{sample}.sortedByCoord.tmp"
params: mem="5G"
conda:"env.yaml"
shell:
"samtools sort -T {log} -m {params.mem} {input.bam} {output} "
when I print PATH: I get exact path to the folders that I want ..
I get this error, because I am not able to give path as wildcard:
Building DAG of jobs...
InputFunctionException in line 22 of /path/to/Snakefile:
AttributeError: 'Wildcards' object has no attribute 'path'
Wildcards:
dir=/path/to/bamfiles
sample=Sample1
How can I give {path}
in the wildcard if it is not in rule all?
OR the different approach can be I replace the BAM file names as:
sample1.libx.folder1.alginment.bam
sample2.liby.folder1.alginment.bam
sample3.libsome.foldersome.alginment.bam
and then use {sample}
as input: How can I do it?
But I think this will create two sets of bam files which will use more memory?
Thanks
Please help
yes, great! thanks a lot! Just a comment that the script is not working with ss.sample, instead I used ss['sample'] Thanks dariober!
I don't understand why one would use pandas for a simple operation such as reading a small table/map. This only adds another dependency to take care of, with no extra value whatsoever (from my point of view). Why not just use a csv.reader?
Hi, are you suggesting the way that I have done before, like: open("path_to_bams.txt") PATH = infile.read()
No, I am criticising using pandas for this. Other than that I think dariober's method of using a tabular file mapping sample to bamfile is the right thing to do. Just with something like: