How can I load numerous files from a config file in Snakemake? Is it worth it ?
1
0
Entering edit mode
3.0 years ago
blackadder ▴ 30

Hello there!

A few days ago I started using Snakemake for the first time.

Mainly I want to use fasterq-dump to download a big number of files from NCBI and I do it like this:

sra = []

with open("run_ids") as f:
    for line in f:
        sra.append(line.strip())

rule all:
    input:
        expand("raw_reads/{sample}.fastq", sample=sra)

rule download:
    output:
        "raw_reads/{sample}.fastq"
    threads: 8
    params:
        "--split-spot --skip-technical"
    log:
        "logs/fasterq-dump/{sample}.log"
    shell:
        """
        fasterq-dump {params} --outdir /home/snakemake/raw_reads {wildcards.sample} -e {threads}
        """

This is working, but:

  1. How can I load the samples from a configure.yaml file instead. Now I have and external txt file with a list of samples and I read it with python
  2. Is it worth it? Will make my script faster if I load the samples from a configure.yaml?

Thanking you in advance!

Snakemake • 1.5k views
ADD COMMENT
3
Entering edit mode
3.0 years ago
seidel 11k

Presumably you would put your sample names in the config.yaml file:

SAMPLES:
  - "sample1"
  - "sample2"
  - "sample3"

and then reference it in your input:

configfile: "config.yml"

rule all:
    input:
        expand("raw_reads/{sample}.fastq", sample=config["SAMPLES"])

But I can't imagine it would have any effect on the speed of your process, as certainly python reading a txt file or a config file is not the slow part of a script. If you have a good text file method, that seems simpler than formatting your sample names for yaml in a config file. On the other hand, a config.yml file is more formally tied to a Snakemake file - so I suppose it's up to how you like to organize things.

ADD COMMENT

Login before adding your answer.

Traffic: 2074 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6