Using wildcards in function in rules
1
0
Entering edit mode
3.5 years ago
aka ▴ 10

Hi Hello,

I want to do a specific trimming for each sample I have in my pipeline.

I have a list of samples in my config below:

#######################
#      Config data    # 
#######################

samples:
  3373-1_CCGCGGTT-CTAGCGCT-AHV5HLDSXY_L004:
  3373-2_TTATAACC-TCGATATC-AHV5HLDSXY_L004:
  3373-3_GGACTTGG-CGTCTGCG-AHV5HLDSXY_L004:
  3373-4_AAGTCCAA-TACTCATA-AHV5HLDSXY_L004: 
  ....

And another list in my config with the samples and primers as a dictionary for trim5 and trim3.

trim5:
  3373-1_CCGCGGTT-CTAGCGCT-AHV5HLDSXY_L004: GATCGGAAGAGCACACGTCTGAACTCCAGTCACCCGCGGTTATCTCGTATGCCGTCTTCTGCTTG 
  3373-2_TTATAACC-TCGATATC-AHV5HLDSXY_L004: GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTATAACCATCTCGTATGCCGTCTTCTGCTTG
  3373-3_GGACTTGG-CGTCTGCG-AHV5HLDSXY_L004: GATCGGAAGAGCACACGTCTGAACTCCAGTCACGGACTTGGATCTCGTATGCCGTCTTCTGCTTG
  3373-4_AAGTCCAA-TACTCATA-AHV5HLDSXY_L004: GATCGGAAGAGCACACGTCTGAACTCCAGTCACAAGTCCAAATCTCGTATGCCGTCTTCTGCTTG
 .....

So I made a function that allows to select the primer linked to this sample.

# get_index_trim5/3 allow to verify that we use the correct adaptater depending of the correct sequence
def get_index_trim5():
         for trim5 in config['trim5']:
                if {wildcards.sample} == trim5:
                     print (config['trim5'])
                     print (config['trim5'][trim5])
                         return ( config['trim5'][trim5] )
                 else: 
                    continue

# get_index_trim5/3 allow to verify that we use the correct adaptater depending of the correct sequence
def get_index_trim3():
            for trim3 in config['trim3']:
                if {wildcards.sample} == trim3:
                     print (config['trim3'])
                     print (config['trim3'][trim3])
                         return ( config['trim3'][trim3] )
                 else: 
                    continue



rule cutadapt_remove_adaptater_trimm:
    priority: 0
    input:
        reads=["../resources/sequences/{sample}_R1.fastq.gz", "../resources/sequences/{sample}_R2.fastq.gz"]
    output:
        fastq1= "../results/trimmed/{sample}_R1_trimmed.fastq.gz",
        fastq2= "../results/trimmed/{sample}_R2_trimmed.fastq.gz",
        qc= "../results/trimmed/{sample}_qc.txt"
    params:
        adapters="-g %s -a %s -a %s -a %s  -G %s -A %s -A %s -A %s -A %s  "%(get_index_trim5(), config['adaptaters']['adap_R1'], config['adaptaters']['PolyAG'],get_index_trim3(), get_index_trim5(), config['adaptaters']['adap_R2'], get_index_trim3(), config['adaptaters']['PolyG']), 
        extra="--minimum-length 100 -q 20"
    log:
        "../results/logs/trimmed/{sample}_trimmed.log"
    benchmark : 
        "../results/benchmarks/trimmed/{sample}_trimmed_benchmark.txt"
    message:
        """
        --- Trimming on {wildcards.sample} {params.adapters} in process ---
        """
    threads: 4
    resources:
        mem_mb=25000
    shell:
      "cutadapt {params.adapters} {params.extra} -o {output.fastq1} -p {output.fastq2} -j {threads} {input.reads} > {output.qc} 2> {log}"

The problem is I need to check in my function that the wildcards correspond to the primer but I can't pass the wildcards.sample as an argument to my function. I have some difficulties to use wildcards

Can you help me?

Thanks in advance and have a nice day, aka

snakemake rules wildcards • 2.9k views
ADD COMMENT
0
Entering edit mode
3.5 years ago
kanika.151 ▴ 160

Hi Aka,

This is a partial snakemake file. Can you also add how you are looking for your samples?

I would add a 'rule all' so that my snakefile knows what it needs to create. For example:

rule all:
      input:
            expand("../results/trimmed/{sample}_R1_trimmed.fastq.gz",config["samples"])
ADD COMMENT
0
Entering edit mode

Hi kanika,

Thank you for your help! Yes it's a partial snakemake file because I use multiple files for my snakemake pipeline.

What do you mean by "how you are looking for your samples?"

I have already a rull all :

#!/bin/python

#include specifics rules files for the pipeline
include: "rules/common.smk"
include: "rules/fastqc_rule.smk"
include: "rules/trimmo_rule.smk"
include: "rules/fastqc_second_rule.smk"


rule all:
    input:  
           ##### First Fastqc
        expand('../results/qc/before_trim/{sample}_{read}.html',sample = config['samples'], read= config['reads']),
        expand('../results/qc/before_trim/{sample}_{read}_fastqc.zip',sample = config['samples'],read= config['reads']),

            #### Trimommatic
        expand('../results/trimmed/{sample}_R1_trimmed.fastq.gz',sample=config['samples']),
        expand('../results/trimmed/{sample}_R2_trimmed.fastq.gz',sample=config['samples']),
        expand('../results/trimmed/{sample}_qc.txt',sample=config['samples']),


             #### Second fastqc after trim (optional)
        expand('../results/qc/after_trim/{sample}_{read}_trimmed.html',sample = config['samples'], read= config['reads'],trim5=config['trim5'],trim3=config['trim3']),
        expand('../results/qc/after_trim/{sample}_{read}_fastqc_trimmed.zip',sample = config['samples'],read= config['reads'],trim5=config['trim5'],trim3=config['trim3']),

I think the problem is that I can't use the wildcards, or the sample name which is trim in my function.

ADD REPLY
0
Entering edit mode

What if you pass your samples in a function like this?
Got the idea from here

for i in SAMPLES:   
 def get_index_trim5(i):
         for trim5 in config['trim5']:
                if {i} == trim5:
                     print (config['trim5'])
                     print (config['trim5'][trim5])
                         return ( config['trim5'][trim5] )
                 else: 
                    continue

I also think {wildcards.sample} only works if you add lambda wildcards: dict[wildcards.sample] .

ADD REPLY
0
Entering edit mode

It's not working like this: Maybe its the SAMPLES I don't define correctly?

SAMPLES, = glob_wildcards(config['samples'])
for i in SAMPLES:   
    def get_index_trim5(i):
            for trim5 in config['trim5']:
                   if {i} == trim5:
                        print (config['trim5'])
                        print (config['trim5'][trim5])
                            return ( config['trim5'][trim5] )
                    else: 
                       continue 

where I add the lambda wildcards: dict[wildcards.sample] ?

ADD REPLY
0
Entering edit mode

I think we have complicated an already complicated situation. LOL.

First, I am sure you cannot use {wildcards.sample} in a function in snakemake. Let me look for a solution and get back to you. If you want to use {wildcards.sample} then you can only use it with lambda function in python. That's what I wanted to say before.

ADD REPLY
0
Entering edit mode

I think too ! XD

I will try to find a solution too but I will wait for you reply also because I am a little lost when it's question of wildcards to be honest.

I will research about lamba too.

Thank you for your help.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Hello,

Thanks for your help, it helped me a little and finally I found the solution. The first problem was that I used the function get_index_trim5() but we don't need ().

Then the second problem was that I tried to call a function with the %s and it returned an object of the function not the key I want.

So, it's not perfect but it solves my problem, I decided to split my adapter variable into different variables in my params part: R1 and R2 for my classic adapters without the functions get_index_trim5 and get_index_trim3 and two other variables with my functions directly.

Like that I called my params.<something> in my bash command for cutadapt.

def get_index_trim5(wildcards):
    trim5 =wildcards.sample
    return config['trim5'][trim5]


def get_index_trim3(wildcards):
    trim3 =wildcards.sample
    return config['trim3'][trim3]


rule cutadapt_remove_adaptater_trimm:
    priority: 0
    input:
        reads=["../resources/sequences/{sample}_R1.fastq.gz", "../resources/sequences/{sample}_R2.fastq.gz"]
    output:
        fastq1= "../results/trimmed/{sample}_R1_trimmed.fastq.gz",
        fastq2= "../results/trimmed/{sample}_R2_trimmed.fastq.gz",
        qc= "../results/trimmed/{sample}_qc.txt"
    params:
        adapters_r1=" -a %s -a %s %s"%(config['adaptaters']['adap_R1'], config['adaptaters']['PolyAG']),
        adapters_r2= "-A %s -A %s "%(config['adaptaters']['adap_R2'], config['adaptaters']['PolyG']),
        extra="--minimum-length 100 -q 20",
        trim5=get_index_trim5,
        trim3=get_index_trim3
    log:
        "../results/logs/trimmed/{sample}_trimmed.log"
    benchmark : 
        "../results/benchmarks/trimmed/{sample}_trimmed_benchmark.txt"
    message:
        """
        --- Trimming on {wildcards.sample} {params.adapters_r1}

        {params.adapters_r2}  in process ---

        {params.trim5} {params.trim3}]
        """
    threads: 4
    resources:
        mem_mb=25000
    shell:
      "cutadapt -g {params.trim5} {params.adapters_r1} -a {params.trim3} -G {params.trim5} {params.adapters_r2} -A {params.trim3} {params.extra} -o {output.fastq1} -p {output.fastq2} -j {threads} {input.reads} > {output.qc} 2> {log}"
ADD REPLY
0
Entering edit mode

Okay! Awesome that you figured it out.

ADD REPLY

Login before adding your answer.

Traffic: 1286 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6