concatenating FASTQ forward and reverse reads in Snakemake
1
0
Entering edit mode
5.2 years ago
Hansen_869 ▴ 80

I'm not sure this is the right place for this question, but please let me know if it's not.

I'm currently using Snakemake to annotate some genes. I have a config file, in which my fastq filenames are listed. It looks something like this:

Samples:
  Sample A:
    R1: 
      SampleA_R1.fq.gz
    R2: 
      SampleA_R2.fq.gz
  Sample B:
    R1: 
      SampleB_R1.fq.gz
    R2: 
      SampleB_R2.fq.gz

This is just a simplification of the config file, as it is much longer, but the pattern is the same. Now, I want to concatenate SampleA R1 and SampleA R2 into one fq.gz file. The same goes for Sample B and so forth, if the config was longer. I guess I should be using wildcards, but I'm unsure about how to access the config file and how to use it with wildcards.

Thanks in advance!!

FASTQ Concatenate bash forward reads • 2.7k views
ADD COMMENT
0
Entering edit mode
5.2 years ago
Eric Lim ★ 2.2k

What have you tried? Understanding how wildcards work can be simple for some and time-consuming for others. I hope this simple example will put you in the right direction.

config = {
    'samples': {
        'a': {
            'r1': 'a_r1.txt',
            'r2': 'a_r2.txt'
        }
    }
}

rule:
    input: expand('{samples}_merge.txt', samples=config['samples'].keys())

rule merge:
    input: r1 = '{sample}_r1.txt',
           r2 = '{sample}_r2.txt'
    output: '{sample}_merge.txt'
    shell: 'cat {input.r1} {input.r2} > {output}'

In addition to the documentation, a good place to start learning about wildcards: https://github.com/leipzig/SandwichesWithSnakemake

ADD COMMENT
0
Entering edit mode

Thanks for your reponse! I haven't really tried anything worth mentioning, as I'm pretty ignorant on the topic. I'll definitely give this a go and look at the link. Will your proposed code also work, if my config is in a different file? I use the:

configfile: "config.yaml"

in the top of the file.

ADD REPLY
0
Entering edit mode

configfile will populate what's in the yaml into config. As to whether the code will work, there's only one way to find out.

ADD REPLY
0
Entering edit mode

So by this approach, the input doesn't access r1 and r2 of the config does it? It uses the key "a", but I specify in my input that it should end with _r1.txt og _r2.txt. Is there any way to not having to write this? Can I just fetch it directly from the config file? I failed to mention, that in my config file, the R1 and R2 key, also have paths like:

R1: FASTQ/SampleA_R1.fq.gz

ADD REPLY
0
Entering edit mode

I generally dislike specifying the entire file path in config, but you can use a function or lambda to read those in. These techniques are all clearly described in the documentation, and I'd seriously recommend you spending some time with it.

Function as input files: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#functions-as-input-files

lambda: https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html (Step 3)

rule merge:
    input: r1 = lambda wildcards: config['samples'][wildcards.sample]['r1'],
             r2 = lambda wildcards: config['samples'][wildcards.sample]['r2']
    output: '{sample}_merge.txt'
    shell: 'cat {input.r1} {input.r2} > {output}'

Unless you start sharing you efforts and code in this learning process, I'm afraid I won't be able to help you further.

ADD REPLY

Login before adding your answer.

Traffic: 1674 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6