(Nextflow) Having trouble with a generic process to stitch together files that were split and then processed
1
0
Entering edit mode
5 weeks ago

I'm new to nextflow and I'm trying to figure out how to:

  1. Get a large .fasta file, split it into pieces of size 1000
  2. Apply same python script to each piece.
  3. Pass the list of files to a process to cat them into one file.

I have something like this:

params.infile = "huge.fasta"
params.size = 1000

process filter_seq {
    input:
    file x

    output:
    path "filtered_${x}.fasta"

    script:
    """
    python $projectDir/filter.py $x filtered_${x}.fasta
    """
}

process collect_split_proc {
    publishDir 'results', model: 'copy'

    input:
    var out_name
    path x

    output:
    path out_name

    script:
    """
    < $x cat >> $out_name
    """
}


workflow {
    Channel.fromPath(params.infile) | splitFasta(by: params.size, file: true) | filter_seq

    collect_split_proc("filtered.fasta", filter_seq.out) | collectFile

}

However, this results in the error:

Process `collect_split_proc` declares 1 input channel but 2 were specified

I want to be able to pass out_name to the collect_split_proc process because I will be doing other operations down the line that also require the file to be split and then stitched back together, so I want to re-use it. I'm also a newbie, so I suspect that there's a trivial way to cat results of splitFasta | process into a specific file. Is there such a thing? If not, what is wrong with my process?

Note: the python script has two command line arguments: input_file_name, output_file_name.

nextflow • 462 views
ADD COMMENT
1
Entering edit mode
5 weeks ago
`var out_name` -> `val(out_name)`
ADD COMMENT

Login before adding your answer.

Traffic: 2232 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6