Hello,
In my nextflow workflow, I have a process PREPARE_READS before a trimming process. In the first process, I simply rename the reads files and combine them if they came from NextSeq. The process have to be completed before starting the second process.
For the TRIMMING process I would like to take the genomes (its the same directory, but with filenames renamed) path as input, by using fromFilePairs
params.input = "./genomes"
ch = Channel.fromPath(params.input)
workflow {
PREPARE_READS(ch)
TRIMMING(PREPARE_READS.out)
}
But I don't know which output I need to set in the PREPARE_READS to get that. Does exist a way to wait until the first process ended to start the second, without 'outputing' the first one and do something like:
params.reads = "./genomes/*_{R1,R2}.fastq"
params.input = "./genomes"
ch1 = Channel.fromPath(params.input)
ch2 = Channel.fromFilePairs(params.reads, checkIfExist:true)
workflow {
PREPARE_READS(ch1)
// I want to wait for the first process ended before
// calling the trimming
TRIMMING(ch2)
}
Thank you
I didn't fully understand what you're looking for. If PREPARE_READS generates output then call TRIMMING on its output, if it doesn't, or nothing depends on this output then why wait for it to finish?
Yes, Asaf is correct. PREPARE_READS and TRIMMING will start at the same time, and use the input from ch1 and ch2 respectively.
Normally, you'd feed the output from PREPARE_READS into TRIMMING ... eg TRIMMING(PREPARE_READS.out).
Each line in the
output:
section in the process should have anemit:
tag at the end, this will be the name of the output, for instance:Then you can use
PREPARE_READS.out.comb
to reference this file (or pair of files, a tuple etc.)Thanks for the concern I apologize for not having been clear The first process take the genomes dir as input to check and rename the reads files.
}
For the all processes after that, I need to take the reads by pairs (channel fromFilePairs)
}
The problem is which output I need in the PREPARE_READS, which allows me to get tuples as input for the rest of the processes ( [ sample_id, [read_R1, read_R2] ] )
I tried output "${genomes_dir}/*_{R1,R2}.fastq" but it didn't work
Again thank you so much for your help
Depending on where in the workflow the error occurs, you could got to the work directory for this process and see what outputs are created and then identify which ones you want to return and tailor the outputpath to match that. I sometimes struggle to wrap my head around the paths when directories are given as inputs such as you are doing here.
If you could give a little more info about the error I might be able to help more too :)
Hi Jack, thank you for the answer I don't get a specific error since I try all inputs and outputs possible I would like to execute the first process with a ch1, wait till it end, and start the second process with ch2 - without get any output from the process one The collect() doesn't work in this case