Entering edit mode
13 months ago
Begonia_pavonina
▴
200
I wish to input the names of my samples in a table with the corresponding files of reads forward and reverse to use them in a Snakemake workflow:
sample fq1 fq2
1 ../reads/110627_0240_AC0254ABXX_2_SA-PE-001.1.fq.gz ../reads/110627_0240_AC0254ABXX_2_SA-PE-001.2.fq.gz
2 ../reads/110627_0240_AC0254ABXX_2_SA-PE-002.1.fq.gz ../reads/110627_0240_AC0254ABXX_2_SA-PE-002.2.fq.gz
22 ../reads/110802_0249_AD0CM0ABXX_3_SA-PE-022.1.fq.gz ../reads/110802_0249_AD0CM0ABXX_3_SA-PE-022.2.fq.gz
Unfortunately, the management of the wildcards is more complex and the error "No values given for wildcard 'samples'." appear with this workflow. Would anyone know how to manage this kind of input for Snakemake?
import os
import snakemake.io
import glob
import pandas as pd
#Config file:
configfile: "config/config.yaml"
# Load the samples table:
table=pd.read_csv("config/samples_reads.tsv", delim_whitespace=True)
# Use the samples table to make lists of samples names/lists:
samples=table['sample']
read1=table['fq1']
read2=table['fq2']
# The first rule:
rule all:
input:
expand("mapped/{samples}.bam")
# Alignment:
rule bwa_mem:
input:
reads=[expand("reads/{read1}"), expand("reads/{read2}")],
idx=multiext("genome", ".amb", ".ann", ".bwt", ".pac", ".sa"),
output:
expand("mapped/{samples}.bam"),
log:
"logs/bwa_mem/{samples}.log",
params:
extra=r"-R '@RG\tID:{samples}\tSM:{samples}'",
sorting="none", # Can be 'none', 'samtools' or 'picard'.
sort_order="queryname", # Can be 'queryname' or 'coordinate'.
sort_extra="", # Extra args for samtools/picard.
threads: 8
wrapper:
"v2.6.0/bio/bwa/mem"
Thanks for the help, I have modified the script and this specific problem has been solved:
However, this method to input samples and reads with different names does not work, it seems Snakemake does not like when samples and reads names differ.
If anyone has a way around it, I would be interested in.
You should review some example Snakefiles to understand how wildcards work in rules - the idea is that the wildcards match between input and output. Using your current table you would need to write a lookup lambda function for Snakemake to map the sample name (e.g.
1
) to the reads (e.g.../reads/110627_0240_AC0254ABXX_2_SA-PE-001.1.fq.gz
). An easier solution would be for you to create a bam column in your table like110627_0240_AC0254ABXX_2_SA-PE-001.bam
so Snakemake could derive the read names from an implicit pattern.Thank you Jeremy Leipzig, actually I have already spent some time trying to understand how the dna-seq-gatk-variant-calling workflow works, as it is using as well a .tsv file to use different samples and read names. https://github.com/snakemake-workflows/dna-seq-gatk-variant-calling/blob/main/workflow/rules/common.smk
Unfortunately it is a bit too complex for me, and it is why I came to this forum for an answer. Now I know the best way is to use lambda function, and I will work on it, thanks to you!
A more general question would be why all Snakemake users do not want to do the same thing than me? Why would you keep the unreadable reads names on all you Snakemake workflow?
i would say bioinformatics has an uneasy relationship with filenames in general. the tendency is to name things logically but as a general rule you shouldn't put metadata in filenames. i think it's a good idea to use a lookup table that stores a variety of metadata, as you have started. it will just take some time to learn to use it in Snakemake.
I think the
expand()
function is not necessary in the input, but I am not sure how to manage the{read1}
variable andread1=R1
without this function.sample
should remain a wildcard in thebwa_mem
rule. For instance,Thanks for the answer @ericlim. The purpose of using a .tsv file was to input reads names and output samples names that are differents (see the .tsv file example above). Is it not possible in Snakemake?
It is possible; it's just not a common use case as wildcards generally match between inputs and outputs. As Jeremy suggested below, you'd need a lambda/function to provide the mapping yourself.