Nextflow - Wait for process ending to call the next one without output
2
0
Entering edit mode
16 months ago
davidmaimoun ▴ 50

Hello,

In my nextflow workflow, I have a process PREPARE_READS before a trimming process. In the first process, I simply rename the reads files and combine them if they came from NextSeq. The process have to be completed before starting the second process.

For the TRIMMING process I would like to take the genomes (its the same directory, but with filenames renamed) path as input, by using fromFilePairs

params.input = "./genomes"
ch = Channel.fromPath(params.input)
workflow {
    PREPARE_READS(ch)
    TRIMMING(PREPARE_READS.out)
}

But I don't know which output I need to set in the PREPARE_READS to get that. Does exist a way to wait until the first process ended to start the second, without 'outputing' the first one and do something like:

params.reads = "./genomes/*_{R1,R2}.fastq"
params.input = "./genomes"
ch1 = Channel.fromPath(params.input)
ch2 = Channel.fromFilePairs(params.reads, checkIfExist:true)

workflow {
    PREPARE_READS(ch1)
    // I want to wait for the first process ended before 
    // calling the trimming
    TRIMMING(ch2)
}

Thank you

nextflow • 2.8k views
ADD COMMENT
1
Entering edit mode

I didn't fully understand what you're looking for. If PREPARE_READS generates output then call TRIMMING on its output, if it doesn't, or nothing depends on this output then why wait for it to finish?

ADD REPLY
0
Entering edit mode

Yes, Asaf is correct. PREPARE_READS and TRIMMING will start at the same time, and use the input from ch1 and ch2 respectively.

Normally, you'd feed the output from PREPARE_READS into TRIMMING ... eg TRIMMING(PREPARE_READS.out).

ADD REPLY
0
Entering edit mode

Each line in the output: section in the process should have an emit: tag at the end, this will be the name of the output, for instance:

output:
path "comb_sequences.fastq.gz", emit: comb

Then you can use PREPARE_READS.out.comb to reference this file (or pair of files, a tuple etc.)

ADD REPLY
0
Entering edit mode

Thanks for the concern I apologize for not having been clear The first process take the genomes dir as input to check and rename the reads files.

process PREPARE_READS {

input:
path genomes_dir

output:
path '*'     <- I don't know what to output


script:
"""
prepare_reads.sh ${genomes_dir}
"""

}

For the all processes after that, I need to take the reads by pairs (channel fromFilePairs)

process TRIMMING {

tag "Running Trimmomatic on $sample_id"
publishDir "./samples/${sample_id}/trimmed", mode: 'copy'
cpus 12

input:
tuple val(sample_id), path(reads)

output:
tuple val(sample_id), path('*')

script:
"""
trimmomatic PE -phred33 -threads ${task.cpus}\
${reads}\
trim_paired_${reads[0]} trim_unpaired_${reads[0]}\
trim_paired_${reads[1]} trim_unpaired_${reads[1]}\
HEADCROP:20 SLIDINGWINDOW:4:20 LEADING:3 TRAILING:3 CROP:265 MINLEN:50
"""

}

params.input = "./genomes"
ch = Channel.fromPath(params.input)
workflow {
    PREPARE_READS(ch)
    TRIMMING(PREPARE_READS.out)
    QC_READS(TRIMMING.out)
    ASSEMBLY(TRIMMING.out)
    QC_ASSEMBLY(ASSEMBLY.out)
    TYPING(ASSEMBLY.out)
    POPULATE_REPORT(TYPING.out)
}

The problem is which output I need in the PREPARE_READS, which allows me to get tuples as input for the rest of the processes ( [ sample_id, [read_R1, read_R2] ] )

I tried output "${genomes_dir}/*_{R1,R2}.fastq" but it didn't work

Again thank you so much for your help

ADD REPLY
1
Entering edit mode

Depending on where in the workflow the error occurs, you could got to the work directory for this process and see what outputs are created and then identify which ones you want to return and tailor the outputpath to match that. I sometimes struggle to wrap my head around the paths when directories are given as inputs such as you are doing here.

If you could give a little more info about the error I might be able to help more too :)

ADD REPLY
0
Entering edit mode

Hi Jack, thank you for the answer I don't get a specific error since I try all inputs and outputs possible I would like to execute the first process with a ch1, wait till it end, and start the second process with ch2 - without get any output from the process one The collect() doesn't work in this case

ADD REPLY
1
Entering edit mode
12 months ago
crhis ▴ 10

You can create an output type val that receives any string, and in the next process you use collect on this output. Thus forcing process 2 to wait for process 1 to complete.

Ex:

process PREPARE_READS{

 input:
   path(file_1) 

 output:
   val("process_complete"), emit: control_1

 script:
   echo "any command line you want to use"

 }

process TRIMMING{

 input:
   path(reads)
   val(control_1) 

 script:
   echo "any command line you want to use"

 }

part of the workflow:

workflow {
      PREPARE_READS(ch1)
      TRIMMING(ch2,  PREPARE_READS.out.control_1.collect())
  }
ADD COMMENT
0
Entering edit mode
16 months ago

The operator collect can be used to wait for previous processes to complete

https://www.nextflow.io/docs/latest/operator.html#collect

https://training.seqera.io/#_multiqc_report

ADD COMMENT
0
Entering edit mode

Thank you but it's didn't work in my case

ADD REPLY

Login before adding your answer.

Traffic: 2612 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6