Hi Hello,
I want to do a specific trimming for each sample I have in my pipeline.
I have a list of samples in my config below:
#######################
# Config data #
#######################
samples:
3373-1_CCGCGGTT-CTAGCGCT-AHV5HLDSXY_L004:
3373-2_TTATAACC-TCGATATC-AHV5HLDSXY_L004:
3373-3_GGACTTGG-CGTCTGCG-AHV5HLDSXY_L004:
3373-4_AAGTCCAA-TACTCATA-AHV5HLDSXY_L004:
....
And another list in my config with the samples and primers as a dictionary for trim5 and trim3.
trim5:
3373-1_CCGCGGTT-CTAGCGCT-AHV5HLDSXY_L004: GATCGGAAGAGCACACGTCTGAACTCCAGTCACCCGCGGTTATCTCGTATGCCGTCTTCTGCTTG
3373-2_TTATAACC-TCGATATC-AHV5HLDSXY_L004: GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTATAACCATCTCGTATGCCGTCTTCTGCTTG
3373-3_GGACTTGG-CGTCTGCG-AHV5HLDSXY_L004: GATCGGAAGAGCACACGTCTGAACTCCAGTCACGGACTTGGATCTCGTATGCCGTCTTCTGCTTG
3373-4_AAGTCCAA-TACTCATA-AHV5HLDSXY_L004: GATCGGAAGAGCACACGTCTGAACTCCAGTCACAAGTCCAAATCTCGTATGCCGTCTTCTGCTTG
.....
So I made a function that allows to select the primer linked to this sample.
# get_index_trim5/3 allow to verify that we use the correct adaptater depending of the correct sequence
def get_index_trim5():
for trim5 in config['trim5']:
if {wildcards.sample} == trim5:
print (config['trim5'])
print (config['trim5'][trim5])
return ( config['trim5'][trim5] )
else:
continue
# get_index_trim5/3 allow to verify that we use the correct adaptater depending of the correct sequence
def get_index_trim3():
for trim3 in config['trim3']:
if {wildcards.sample} == trim3:
print (config['trim3'])
print (config['trim3'][trim3])
return ( config['trim3'][trim3] )
else:
continue
rule cutadapt_remove_adaptater_trimm:
priority: 0
input:
reads=["../resources/sequences/{sample}_R1.fastq.gz", "../resources/sequences/{sample}_R2.fastq.gz"]
output:
fastq1= "../results/trimmed/{sample}_R1_trimmed.fastq.gz",
fastq2= "../results/trimmed/{sample}_R2_trimmed.fastq.gz",
qc= "../results/trimmed/{sample}_qc.txt"
params:
adapters="-g %s -a %s -a %s -a %s -G %s -A %s -A %s -A %s -A %s "%(get_index_trim5(), config['adaptaters']['adap_R1'], config['adaptaters']['PolyAG'],get_index_trim3(), get_index_trim5(), config['adaptaters']['adap_R2'], get_index_trim3(), config['adaptaters']['PolyG']),
extra="--minimum-length 100 -q 20"
log:
"../results/logs/trimmed/{sample}_trimmed.log"
benchmark :
"../results/benchmarks/trimmed/{sample}_trimmed_benchmark.txt"
message:
"""
--- Trimming on {wildcards.sample} {params.adapters} in process ---
"""
threads: 4
resources:
mem_mb=25000
shell:
"cutadapt {params.adapters} {params.extra} -o {output.fastq1} -p {output.fastq2} -j {threads} {input.reads} > {output.qc} 2> {log}"
The problem is I need to check in my function that the wildcards correspond to the primer but I can't pass the wildcards.sample as an argument to my function. I have some difficulties to use wildcards
Can you help me?
Thanks in advance and have a nice day, aka
Hi kanika,
Thank you for your help! Yes it's a partial snakemake file because I use multiple files for my snakemake pipeline.
What do you mean by "how you are looking for your samples?"
I have already a rull all :
I think the problem is that I can't use the wildcards, or the sample name which is trim in my function.
What if you pass your samples in a function like this?
Got the idea from here
I also think
{wildcards.sample}
only works if you add lambda wildcards:dict[wildcards.sample]
.It's not working like this: Maybe its the SAMPLES I don't define correctly?
where I add the
lambda wildcards: dict[wildcards.sample]
?I think we have complicated an already complicated situation. LOL.
First, I am sure you cannot use
{wildcards.sample}
in a function in snakemake. Let me look for a solution and get back to you. If you want to use{wildcards.sample}
then you can only use it withlambda
function in python. That's what I wanted to say before.I think too ! XD
I will try to find a solution too but I will wait for you reply also because I am a little lost when it's question of wildcards to be honest.
I will research about lamba too.
Thank you for your help.
I hope this helps.
How to access a dict from config_file in snakemake
Hello,
Thanks for your help, it helped me a little and finally I found the solution. The first problem was that I used the function
get_index_trim5()
but we don't need()
.Then the second problem was that I tried to call a function with the %s and it returned an object of the function not the key I want.
So, it's not perfect but it solves my problem, I decided to split my adapter variable into different variables in my
params
part: R1 and R2 for my classic adapters without the functionsget_index_trim5
andget_index_trim3
and two other variables with my functions directly.Like that I called my params.<something> in my bash command for cutadapt.
Okay! Awesome that you figured it out.