Nextflow: Multiple jobs merged into one.
2
2
Entering edit mode
2.4 years ago
Alexis ▴ 40

Hi,

I am very new to nextflow (I used to work with snakemake in the past). I am trying to create a dummy workflow for understanding the basics of the _pipeline creation_ and the first step at designing my own.

For this I want to unzip my fastq files and create 2 dummy repports from the read length of the R2 read. I have 2 scripts for now, main_qc.nf (including the workflow) and modules_qc.nf (with all processes) shown below:

#!/usr/bin/env nextflow
nextflow.enable.dsl=2

// Include in workflow
include {
    GUNZIP_FASTQ;
    GET_READ_LENGTH 
} from "./modules_qc.nf"

// Initial parameters
datadir="/data"
bashdir="/shs"

params.sample = "s1"
params.fastq = "$datadir/fastqs/${params.sample}_2.fastq.gz"

workflow {
    // Create
    Channel
        .fromFile(params.fastq)
        .set { chFastqFile }

    Channel 
        .of(params.sample)
        .set { samples }

    GUNZIP_FASTQ(chFastqFile)
    GET_READ_LENGTH(samples, GUNZIP_FASTQ.out)
}
#!/usr/bin/env nextflow
nextflow.enable.dsl=2

// Unzipping files 
process GUNZIP_FASTQ {

    input: 
        path target

    output: 
        path "${target.simpleName}.fastq"

    script:
        """
        gunzip -d -c ${target} > ${target.simpleName}.fastq 
        """
}

// Export read length to file
process GET_READ_LENGTH {
    input: 
        val  sample_id
        path fastq

    output: 
        path "${sample_id}.readLength.txt"

    script: 
        """
        bash ./shs/readLength.sh ${fastq} ${sample_id}.readLength.txt
        """
}

I want to first run the GUNZIP process on all fastqs for all sample and then create one dummy repports per sample. GUNZIP processes have to run twice ad much as the other processes.

How should I proced?

Thank you very much

nextflow pipeline fastq • 2.9k views
ADD COMMENT
0
Entering edit mode

Your channel that goes in the GUNZIP_FASTQ is actually a queue channel, GUNZIP_FASTQ will be executed as long as you have item in said channel: https://www.nextflow.io/docs/latest/channel.html#channels

ADD REPLY
0
Entering edit mode

Your channel chFastqFile receives only one file : $datadir/fastqs/${params.sample}_2.fastq.gz. You may want to include a glob * operator, use fromPath (afaicr fromFile is deprecated) or adopt Pierre's full revamp.

ADD REPLY
0
Entering edit mode

Thank you very much for all your insight. With all of this I was able to create what I hoped for!

ADD REPLY
4
Entering edit mode
2.4 years ago

I would write it the following way (not tested)

#!/usr/bin/env nextflow
nextflow.enable.dsl=2

// input is a tsv file with sample and path/to/fastq
params.sample_fastq = ""

workflow {
        sample_fastq_ch = Channel.fromPath(params.sample_fastq).splitCsv(header: false,sep='\t')
        gunzip_ch = GUNZIP_FASTQ(sample_fastq_ch)
        readlen_ch = GET_READ_LENGTH(gunzip_ch.out)
        zip_ch = ZIPALL(readlen_ch.out.collect())
    }


process GUNZIP_FASTQ {
    input: 
        tuple val(sample),val(fq)
    output: 
        path("${sample}.fastq"),emit:out
    script:
        """
        gunzip-c ${fq} > ${sample}.fastq
        """
       }

process GET_READ_LENGTH {
    input: 
        tuple val(sample),val(fq)
    output: 
       path("${sample_id}.readLength.txt"),emit:out
    script: 
        """
        bash /full/path/to/shs/readLength.sh ${fq} "${sample}.readLength.txt"
        """
    }

process ZIPALL {
    input: 
        val(L)

    output: 
       path("output.zip")
    script: 
        """
       zip -j output.zip ${L.join(" ")}
        """
    }

a side note : whatever is "readLength.sh", you shouldn't use a software that requires you to gunzip a fastq...

ADD COMMENT
1
Entering edit mode
2.4 years ago
Maxime Garcia ▴ 350

Your channel that is going into the GUNZIP_FASTQ is actually a queue channel. Your process will be execute depending on how many items you have in said channel. cf https://www.nextflow.io/docs/latest/channel.html#channels

ADD COMMENT
1
Entering edit mode

Sorry, the anti-spambot got triggered on this somehow, restored.

ADD REPLY
0
Entering edit mode

Hi, Thank you very much for your quick answer. I did manage to run the GUNZIP_FASTQ process multiple times with a wildcard. e.g: params.fastq = "$datadir/fastqs/${params.sample}_*.fastq.gz" but at the end I still only launch one sample (the last one) with the GET_READ_LENGTH.

I don't understand how the queue works from one task to another.

ADD REPLY
1
Entering edit mode

I'm guessing issue might be due to the fact that your samples channel has only one item, and as it's a queue channel, so it'll be executed only until no item left too. I'm assuming you want your samples channel to be able to have different values, and you want to actually combine your samples and your fastq channel into just one, with a tuple sample, fastq, so I'd look into combining operators (https://www.nextflow.io/docs/latest/operator.html#combining-operators)

I'd really recommend having a look at the tutorials, they've been updated recently, and you'll get tons of information and nice examples: https://seqera.io/nextflow/learn/

ADD REPLY

Login before adding your answer.

Traffic: 2408 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6